Introduction

This report describes the methods used to locate, clean, and collate data for the Metchosin Insect Biomass Project (2018-2023), as completed during a Directed Studies project by Larissa Bron in Fall 2024. The format of this report is R Markdown which integrates the research narrative with embedded code and data outputs. Note: Click the “show” button on the right side of the document if you want to see the code used to generate any of the outputs.

The motivation for this directed studies project was to find data that could be used to help explain the spatiotemporal variation in insect biomass in Metchosin using mathematical modelling, building onto the work of previous student researchers (e.g. Innes (2024) and Nikel (2019)). Student research has contributed to understanding the variation in insect biomass in Metchosin by testing explanatory variables (Nikel, 2019) and considering whether significant differences exist between insect collection traps or sampling years (Innes, 2024).

Nikel (2019) assessed whether elevation, spatial (distance from collection trap to coastline or urban area), or temporal (collection period) explanatory variables could be attributed to insect biomass counts in 2018. The short sampling period (one year) and coarse spatial variables contributed to low explanatory value.

Innes (2024) illustrated trends in biomass compared between insect orders, trap number, and time using descriptive statistics. This resulted in highlighting the greatest contribution to biomass was from Diptera and Hymenoptera. As well, significance testing with Kruskal-Wallis and Dunn tests was used to consider differences between yearly and monthly sampling intervals between 2018 and 2022. The result from significance testing was no significant difference between years, and some significance between summer and fall months.

Still, the question remains: What factors are contributing to the differences in insect biomass observed in Metchosin between 2018-2023? To answer this, I sourced publicly accessible 1 metre resolution land cover data (Capital Regional District) and daily climate data (Environment Canada), then cleaned and aggregated this data in different compositions to answer this question from a variety of perspectives relevant to insect biodiversity and conservation.

Selecting Variables for Modelling

I conducted a mini literature review spanning recent academic literature that explores, experiments on, and observes insect biomass trends globally. The results of this literature review were used to identify the categories of threats to insects, which in turn were then used to guide a data search to source explanatory variables for modelling insect trends observed between 2018 to 2023.

Primary Threats to Insects

The primary threats to insects are habitat loss, chemical and light pollution, pathogens and introduced species, and climate change (Bluthgen et al., 2022; Müller et al., 2024; Uhler et al., 2021). These threats were used to guide a review of available public data to assess potential covariates for modelling (Table 1).

Table 1: Publicly accessible data relating to the primary threats to insects.

Threat to Insects Available Data
Habitat Loss

Land Cover

Protected Areas

Proximity to Water

Pollution

Light Pollution

Agricultural Output

Pathogens and Introduced Species Not easily accessible, potentially vague.
Climate Change

Weather

The research team reviewed the available data (Table 1) and chose climate and habitat as explanatory variable categories for modelling. I then revisited the literature review to assess how researchers incorporate weather and land cover data into modelling (Table 2).

Table 2: Insect research that incorporates climate and landscape variables in modelling.

Research Climate and Weather Landscape and Habitat
Gebert et al. (2023)

Temperature and precipitation.

Mean and standard deviation for both were calculated from March to July using monthly data.

Yearly mean, median, minimum, and standard deviation of NDVI (Normalized Difference Vegetation Index) at buffers of 50 metres and 200 metres to assess vegetation cover and vegetation activity.
Hallmann et al. (2017)

Mean daily temperature, precipitation, and wind speed decomposed into long-term average trends (between years), mean seasonal trends, and yearly seasonal anomaly components.

Number of frost days and sum of precipitation in the months November-February preceding a sampling season.

Land use and change over time derived from aerial photos between 1989-1994 and 2012-2015 at a 200 metre radius.

Land use: forest, agricultural area, natural grassland, and surface water.

Habitat data: plant inventories within 50 metres for plant species richness each year.

Ellenberg nitrogen, pH, light, temperature, and moisture each year.

Hallmann et al. (2019) Temperature, the sum of precipitation, mean relative moisture content, and mean wind speed. n/a
Li and Wilkins (2022)

Temperature.

Wind speed was considered but not found relevant.

Spatial complexity (open versus cluttered).

Considered but irrelevant: site distance to water, number of trees within a 50 kilometre radius, site ground cover (concrete versus soil), and site distance to interstate highway.

Müller et al. (2024)

The weather during the sampling period: temperature, precipitation.

Weather anomalies for temperature and precipitation of: winter, April current, April previous, and month previous.

Number of herb species and tree species.

Ellenberg value for light and temperature.

The proportion of arable land, forest, grassland, and water.

Seibold et al. (2019) Mean winter temperature and precipitation during the growing period. Cover of arable fields and grasslands within 1000 metres.
Svenningsen et al. (2024) n/a Urban and farmland cover within 1000 metres on each side of the road (vehicle sampling).
Uhler et al. (2021)

Temperature and humidity at local scale.

Annual mean temperature and precipitation for macro scale.

Climate normals and deviations.

Land use is aggregated at local and landscape scales.
Uphus et al. (2023) Mean temperature from March and April, and the average daily mean air temperature and humidity.

Spring green-up for 100, 200, 500, 1000, 2000, and 3000 metres around each plot.

Land use: forest, grassland, cropland, urbanized areas.

Welti et al. (2021) Monthly means of maximum and minimum temperatures and monthly cumulative precipitation.

Land use: urban, intensive agriculture, non-irrigated agriculture, pasture/orchard, forest, grassland/shrubland, freshwater, and saltwater.

Elevation.

The research team then met to discuss incorporating climate and land cover data into modelling. We considered land cover extracted at increasing spatial scales (e.g. Gebert et al., 2023; Seibold et al., 2019; Svenningsen et al., 2024; Uhler et al., 2021) and weather data aggregated in different temporal configurations (e.g. Müller 2024; Seibold et al., 2019; Welti et al., 2021). The resulting land cover and weather variables were chosen for correlation testing and model building:

  • Climate (daily weather station data (Environment Canada))
    • Sampling Period (“Month” which is biweekly sample collection)
      • Mean of the mean daily temperature.
      • Mean maximum daily temperature.
      • Mean minimum daily temperature.
      • Mean daily total precipitation.
      • Mean maximum daily wind gust.
      • Total precipitation.
    • Year
      • Mean daily temperature.

      • Number of days below 0°C.

      • Number of days above 25°C.

      • Total precipitation.

      • Previous year winter precipitation (November-February).

      • Previous year winter temperature (November-February).

      • Current year growing season precipitation (March-October).

  • Land Cover (1 metre resolution land cover mapping (Capital Regional District)
    • Extracted at 50, 100, 250, and 500 metre (circle diameter) buffers around each insect trap.

Acquisition and Preparation of Data

The following sections of the report are in R Markdown format - describing the steps involved in accessing, preparing, and collating climate and land cover data for use as explanatory variables in modelling to help explain spatiotemporal insect biomass trends.

The next two code chunks correspond to preparing the R environment for data work.

install.packages("tidyverse")
install.packages("dplyr")
install.packages("lubridate")
install.packages("purr")
install.packages("PerformanceAnalytics")
library("tidyverse") # keep things working together nicely
library("dplyr") # keep things working together nicely 
library("lubridate") # date data support
library("purrr") # iteration
library('PerformanceAnalytics') # for correlation matrix

Uploading Data

Metchosin Insect Count Data

This insect count data was prepared by Freya Innes and then Larissa Bron manually added the beginning and end of biweekly sampling intervals (range_begin, range_end) in Excel, resulting in the Insect Count Data (https://github.com/larissaissabron/MetchosinInsectBiomass/blob/main/Metchosin%20Project%20Metchosin%20Data.csv).

Variable Description
trap_number Malaise trap site number
year Year insects collected in trap
month Biweekly sampling interval in which insects were collected in trap
range_begin First day of the biweekly sampling interval, “month”
range_end Last day of the biweekly sampling interval, “month”
collection_date Date the insect trap contents from the preceding biweekly collection interval were collected
group Major flying insect orders, one of: Hymenoptera, Diptera, Lepidoptera, Hemiptera, Coleoptera, or Other
biomass Biomass measured per insect order (“group”)
biomass_total Biomass summed for all insect orders (“groups”) per biweekly sampling interval (“month”)
percent_biomass Contribution of each insect order (“group”) to the total biomass per biweekly sampling interval (“month”)
individual_count Number of individual insects counted for each insect order (“group”) per biweekly sampling interval (“month”)
count_total Individual count summed for all insect orders (“groups”) per biweekly sampling interval (“month”)

The insect count data was uploaded to R and prepared to streamline the workflow. Descriptions of the insect count data follow.

# Import .csv file of insect count data
insect_counts <- read_csv("data/Metchosin Project Metchosin Data.csv") %>% 
  # change all column names to be lower case to assist in joining and working between datasets later
    setNames(
    names(.) %>% 
      tolower()) %>% 
  # convert trap_number to a factor so that it is labelled appropriately in graphics
  mutate_at(c('trap_number'), as.factor)
## Rows: 6078 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): Month, Collection_Date, Group
## dbl  (7): Trap_Number, Year, Biomass, Biomass_Total, Percent_Biomass, Indivi...
## date (2): Range_Begin, Range_End
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Change the format of collection date to be what r recognizes as dates
insect_counts <- insect_counts %>% 
  mutate(collection_date = dmy(collection_date)) 

# Separate collection date components to get a month for calculating monthly climate data
insect_counts <- insect_counts %>% 
  separate(collection_date, c('collect_year', 'collect_month', 'collect_day'), sep = "-",remove = FALSE)

# year, month, day to numeric instead of character
insect_counts$collect_year <- as.numeric(insect_counts$collect_year)
insect_counts$collect_month <- as.numeric(insect_counts$collect_month)
insect_counts$collect_day <- as.numeric(insect_counts$collect_day)

# change format from a dataframe to tibble to assist R in working with large dataset 
as_tibble(insect_counts)
## # A tibble: 6,078 × 15
##    trap_number  year month   range_begin range_end  collection_date collect_year
##    <fct>       <dbl> <chr>   <date>      <date>     <date>                 <dbl>
##  1 1            2018 (July … 2018-07-06  2018-07-20 2018-07-20              2018
##  2 1            2018 (July … 2018-07-06  2018-07-20 2018-07-20              2018
##  3 1            2018 (July … 2018-07-06  2018-07-20 2018-07-20              2018
##  4 1            2018 (July … 2018-07-06  2018-07-20 2018-07-20              2018
##  5 1            2018 (July … 2018-07-06  2018-07-20 2018-07-20              2018
##  6 1            2018 (July … 2018-07-06  2018-07-20 2018-07-20              2018
##  7 2            2018 (July … 2018-07-06  2018-07-20 2018-07-20              2018
##  8 2            2018 (July … 2018-07-06  2018-07-20 2018-07-20              2018
##  9 2            2018 (July … 2018-07-06  2018-07-20 2018-07-20              2018
## 10 2            2018 (July … 2018-07-06  2018-07-20 2018-07-20              2018
## # ℹ 6,068 more rows
## # ℹ 8 more variables: collect_month <dbl>, collect_day <dbl>, group <chr>,
## #   biomass <dbl>, biomass_total <dbl>, percent_biomass <dbl>,
## #   individual_count <dbl>, count_total <dbl>
# check out imported data to make sure nothing obviously wrong
summary(insect_counts)
##   trap_number        year         month            range_begin        
##  14     : 414   Min.   :2018   Length:6078        Min.   :2018-07-06  
##  9      : 408   1st Qu.:2019   Class :character   1st Qu.:2019-08-08  
##  12     : 408   Median :2020   Mode  :character   Median :2020-09-01  
##  13     : 408   Mean   :2020                      Mean   :2020-11-28  
##  4      : 402   3rd Qu.:2022                      3rd Qu.:2022-05-01  
##  5      : 396   Max.   :2023                      Max.   :2023-10-02  
##  (Other):3642                                                         
##    range_end          collection_date       collect_year  collect_month   
##  Min.   :2018-07-20   Min.   :2018-07-20   Min.   :2018   Min.   : 5.000  
##  1st Qu.:2019-08-21   1st Qu.:2019-08-21   1st Qu.:2019   1st Qu.: 6.000  
##  Median :2020-09-14   Median :2020-09-14   Median :2020   Median : 8.000  
##  Mean   :2020-12-10   Mean   :2020-12-11   Mean   :2020   Mean   : 7.694  
##  3rd Qu.:2022-05-14   3rd Qu.:2022-05-14   3rd Qu.:2022   3rd Qu.: 9.000  
##  Max.   :2023-10-15   Max.   :2023-10-15   Max.   :2023   Max.   :10.000  
##                                                                           
##   collect_day       group              biomass       biomass_total   
##  Min.   : 1.00   Length:6078        Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 9.00   Class :character   1st Qu.: 0.130   1st Qu.: 1.610  
##  Median :16.00   Mode  :character   Median : 0.380   Median : 3.830  
##  Mean   :16.19                      Mean   : 1.004   Mean   : 6.026  
##  3rd Qu.:24.00                      3rd Qu.: 1.050   3rd Qu.: 8.150  
##  Max.   :31.00                      Max.   :35.820   Max.   :43.460  
##                                                                      
##  percent_biomass individual_count  count_total  
##  Min.   : 0.00   Min.   :   0.0   Min.   :   0  
##  1st Qu.: 4.55   1st Qu.:  32.0   1st Qu.: 659  
##  Median :11.85   Median :  99.0   Median :1258  
##  Mean   :16.67   Mean   : 281.9   Mean   :1690  
##  3rd Qu.:24.94   3rd Qu.: 288.0   3rd Qu.:2173  
##  Max.   :93.78   Max.   :8615.0   Max.   :9827  
##  NA's   :18
str(insect_counts)
## tibble [6,078 × 15] (S3: tbl_df/tbl/data.frame)
##  $ trap_number     : Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 1 1 2 2 2 2 ...
##  $ year            : num [1:6078] 2018 2018 2018 2018 2018 ...
##  $ month           : chr [1:6078] "(July 6 - 20)" "(July 6 - 20)" "(July 6 - 20)" "(July 6 - 20)" ...
##  $ range_begin     : Date[1:6078], format: "2018-07-06" "2018-07-06" ...
##  $ range_end       : Date[1:6078], format: "2018-07-20" "2018-07-20" ...
##  $ collection_date : Date[1:6078], format: "2018-07-20" "2018-07-20" ...
##  $ collect_year    : num [1:6078] 2018 2018 2018 2018 2018 ...
##  $ collect_month   : num [1:6078] 7 7 7 7 7 7 7 7 7 7 ...
##  $ collect_day     : num [1:6078] 20 20 20 20 20 20 20 20 20 20 ...
##  $ group           : chr [1:6078] "Hymenoptera" "Diptera" "Lepidoptera" "Hemiptera" ...
##  $ biomass         : num [1:6078] 1.18 1.83 5.48 0.06 0.56 0.04 2.69 1.76 4.77 0.2 ...
##  $ biomass_total   : num [1:6078] 9.15 9.15 9.15 9.15 9.15 ...
##  $ percent_biomass : num [1:6078] 12.9 20.01 59.87 0.7 6.13 ...
##  $ individual_count: num [1:6078] 127 476 248 58 40 97 257 244 269 80 ...
##  $ count_total     : num [1:6078] 1046 1046 1046 1046 1046 ...

Visualization to confirm that both the biomass and individual counts are following expected trends. Trends should include: all orders represented, as well as each trap and year having a similar range of values.

# plot trap number vs. biomass
ggplot(
  # data into the ggplot
  data = insect_counts,
  mapping = aes(x = trap_number, y = biomass)
) +
  # scatterplot
  geom_point() +
  # label
  labs(
    x = "Trap Number",
    y = "Biomass (g)",
    color = "Order",
    title = "Biomass as a Function of Trap Number Compared Between Insect Orders"
  ) +
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  # present in a grid
  facet_wrap(group ~ ., scales = 'free', ncol=2)

# plot trap number vs. individual count
ggplot(
  # data into the ggplot
  data = insect_counts,
  mapping = aes(x = trap_number, y = individual_count)
) +
  # scatterplot 
  geom_point() +
  # label
    labs(
    x = "Trap Number",
    y = "Individual Count",
    color = "Order",
    title = "Individual Count as a Function of Trap Number Compared Between Insect Orders"
  ) +
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  # present in a grid
  facet_wrap(group ~ ., scales = 'free', ncol=2)

# plot year vs. biomass
ggplot(
  # data into the plot
  data = insect_counts,
  mapping = aes(x = year, y = biomass)
) +
  # scatter plot
  geom_point() +
  # label
    labs(
    x = "Year",
    y = "Biomass",
    color = "Order",
    title = "Biomass as a Function of Year Compared Between Insect Orders"
  ) +
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  # present in a grid
  facet_wrap(group ~ ., scales = 'free', ncol=2)

# plot year vs. individual count
ggplot(
  # data into plot
  data = insect_counts,
  mapping = aes(x = year, y = individual_count)
) +
  # scatterplot
  geom_point() +
  # label
  labs(
    x = "Year",
    y = "Individual Count",
    color = "Order",
    title = "Individual Count as a Function of Year Compared Between Insect Orders"
  ) +
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  # present in a grid
  facet_wrap(group ~ ., scales = 'free', ncol=2)

Climate Data

The Pacific Climate Impacts Consortium British Columbia Station Data Explorer (https://services.pacificclimate.org/met-data-portal-pcds/app/) was used to locate a relevant Environment Canada weather station for the District of Metchosin.

Figure: The location of three relevant Environment Canada weather stations within and near the District of Metchosin.

The Metchosin and William Head weather stations within Metchosin boundaries were missing values of interest, either weather variables or time series, and as such were excluded. The Esquimalt Harbour (Station ID 1012710) was selected as the nearest location.

Daily weather data was downloaded from Environment Canada (https://climate-change.canada.ca/climate-data/#/daily-climate-data) for the Esquimalt Harbour weather station between January 1, 2017 and December 31, 2023. This data was then manually prepared in Excel by removing all irrelevant data, resulting in the Climate Data file (https://github.com/larissaissabron/MetchosinInsectBiomass/blob/main/EsquimaltHarbour_ClimateData.csv).

Variable Units Description
id n/a Trap number, year, month, day (format of trapnumber.year.month.day)
min_temperature °C Minimum daily temperature
max_temperature °C Maximum daily temperature
mean_temperature °C Mean daily temperature
total_precipitation mm Total daily precipitation
speed_max_gust km/h Speed of the daily maximum wind gust

The climate data was uploaded to R and prepared for analysis.

# Import .csv file of climate data.
climate <- read_csv("data/EsquimaltHarbour_ClimateData.csv") %>% 
    # change all column names to be lower case to assist in joining and working between datasets
    setNames(
    names(.) %>% 
      tolower()) %>% 
  # rename columns for clarity in future aggregations of climate data by distinguishing this as daily data.
  rename(
    daily_min_temp = min_temperature,
    daily_max_temp = max_temperature,
    daily_mean_temp = mean_temperature,
    daily_tot_precip = total_precipitation,
    daily_max_gust = speed_max_gust
  )
## Rows: 2500 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ID
## dbl (5): MIN_TEMPERATURE, MAX_TEMPERATURE, MEAN_TEMPERATURE, TOTAL_PRECIPITA...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# change format from dataframe to tibble to assist R in working with large dataset
as_tibble(climate)
## # A tibble: 2,500 × 6
##    id             daily_min_temp daily_max_temp daily_mean_temp daily_tot_precip
##    <chr>                   <dbl>          <dbl>           <dbl>            <dbl>
##  1 1012710.2017.…            0              4.9             2.5             NA  
##  2 1012710.2017.…           -0.4            2.2             0.9             NA  
##  3 1012710.2017.…           -1.7            0.2            -0.8             NA  
##  4 1012710.2017.…           -1.1            4               1.5             NA  
##  5 1012710.2017.…           -2.5            3.2             0.4             NA  
##  6 1012710.2017.…           -0.7            4.6             2               NA  
##  7 1012710.2017.…            1.1            4.9             3                0  
##  8 1012710.2017.…            1.6            5.4             3.5              1.8
##  9 1012710.2017.…            3.4            7.1             5.3              5.6
## 10 1012710.2017.…           -0.4            5               2.3             NA  
## # ℹ 2,490 more rows
## # ℹ 1 more variable: daily_max_gust <dbl>
# Check out imported data to make sure nothing obviously wrong
summary(climate)
##       id            daily_min_temp   daily_max_temp  daily_mean_temp
##  Length:2500        Min.   :-8.100   Min.   :-4.20   Min.   :-6.10  
##  Class :character   1st Qu.: 4.300   1st Qu.: 9.00   1st Qu.: 6.70  
##  Mode  :character   Median : 7.600   Median :12.90   Median :10.20  
##                     Mean   : 7.609   Mean   :13.47   Mean   :10.54  
##                     3rd Qu.:11.400   3rd Qu.:17.80   3rd Qu.:14.60  
##                     Max.   :16.400   Max.   :35.90   Max.   :25.20  
##                     NA's   :27       NA's   :33      NA's   :33     
##  daily_tot_precip daily_max_gust 
##  Min.   : 0.00    Min.   : 0.00  
##  1st Qu.: 0.00    1st Qu.:33.00  
##  Median : 0.00    Median :39.00  
##  Mean   : 1.92    Mean   :36.68  
##  3rd Qu.: 1.20    3rd Qu.:46.00  
##  Max.   :60.80    Max.   :94.00  
##  NA's   :137      NA's   :1098
str(climate)
## spc_tbl_ [2,500 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id              : chr [1:2500] "1012710.2017.1.1" "1012710.2017.1.2" "1012710.2017.1.3" "1012710.2017.1.4" ...
##  $ daily_min_temp  : num [1:2500] 0 -0.4 -1.7 -1.1 -2.5 -0.7 1.1 1.6 3.4 -0.4 ...
##  $ daily_max_temp  : num [1:2500] 4.9 2.2 0.2 4 3.2 4.6 4.9 5.4 7.1 5 ...
##  $ daily_mean_temp : num [1:2500] 2.5 0.9 -0.8 1.5 0.4 2 3 3.5 5.3 2.3 ...
##  $ daily_tot_precip: num [1:2500] NA NA NA NA NA NA 0 1.8 5.6 NA ...
##  $ daily_max_gust  : num [1:2500] 57 61 57 56 0 0 0 43 61 74 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ID = col_character(),
##   ..   MIN_TEMPERATURE = col_double(),
##   ..   MAX_TEMPERATURE = col_double(),
##   ..   MEAN_TEMPERATURE = col_double(),
##   ..   TOTAL_PRECIPITATION = col_double(),
##   ..   SPEED_MAX_GUST = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Climate data was further manipulated to streamline the workflow for aggregating climate variables of interest in later steps.

# Updating the climate dataframe.
climate <- climate %>% 
  # separate ID into it's component parts
  separate(id, c('station_id', 'year', 'month', 'day')) %>% 
  # remove original ID column
  select(-1) %>% 
  # get date into a format that is easier to work with in R
  mutate(date = make_date(year, month, day)) %>% 
  # make date the first column
  relocate(date) %>% 
  # remove working columns for year, month, and date
  select(-2, -3, -4) 

# find missing dates and add to dataframe with NAs (ChatGPT support to find out how to look for and fill in missing dates)

# look for missing dates. Generate a full sequence of dates from the minimum to maximum in the dataset
full_dates <- data.frame(date = seq(min(climate$date), max(climate$date), by = "day")) 

# Identify missing dates by performing an anti_join
missing_dates <- anti_join(full_dates, climate, by = "date")

# Add missing dates to climate dataframe
climate <- climate %>% 
  right_join(full_dates, by = "date") 

# Re-update the climate dataframe by separating out the year, month, and day
climate <- climate %>% 
  # separate ID into it's component parts
  separate(date, c('year', 'month', 'day')) %>% 
  # get date into a format that is easier to work with in R
  mutate(date = make_date(year, month, day)) %>% 
  # make date the first column
  relocate(date) 
  
# View updated climate data
print(climate)
## # A tibble: 2,556 × 9
##    date       year  month day   daily_min_temp daily_max_temp daily_mean_temp
##    <date>     <chr> <chr> <chr>          <dbl>          <dbl>           <dbl>
##  1 2017-01-01 2017  01    01               0              4.9             2.5
##  2 2017-01-02 2017  01    02              -0.4            2.2             0.9
##  3 2017-01-03 2017  01    03              -1.7            0.2            -0.8
##  4 2017-01-04 2017  01    04              -1.1            4               1.5
##  5 2017-01-05 2017  01    05              -2.5            3.2             0.4
##  6 2017-01-06 2017  01    06              -0.7            4.6             2  
##  7 2017-01-07 2017  01    07               1.1            4.9             3  
##  8 2017-01-08 2017  01    08               1.6            5.4             3.5
##  9 2017-01-09 2017  01    09               3.4            7.1             5.3
## 10 2017-01-10 2017  01    10              -0.4            5               2.3
## # ℹ 2,546 more rows
## # ℹ 2 more variables: daily_tot_precip <dbl>, daily_max_gust <dbl>
# Confirm that nothing weird since updates to climate data
summary(climate)
##       date                year              month               day           
##  Min.   :2017-01-01   Length:2556        Length:2556        Length:2556       
##  1st Qu.:2018-10-01   Class :character   Class :character   Class :character  
##  Median :2020-07-01   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :2020-07-01                                                           
##  3rd Qu.:2022-04-01                                                           
##  Max.   :2023-12-31                                                           
##                                                                               
##  daily_min_temp   daily_max_temp  daily_mean_temp daily_tot_precip
##  Min.   :-8.100   Min.   :-4.20   Min.   :-6.10   Min.   : 0.00   
##  1st Qu.: 4.300   1st Qu.: 9.00   1st Qu.: 6.70   1st Qu.: 0.00   
##  Median : 7.600   Median :12.90   Median :10.20   Median : 0.00   
##  Mean   : 7.609   Mean   :13.47   Mean   :10.54   Mean   : 1.92   
##  3rd Qu.:11.400   3rd Qu.:17.80   3rd Qu.:14.60   3rd Qu.: 1.20   
##  Max.   :16.400   Max.   :35.90   Max.   :25.20   Max.   :60.80   
##  NA's   :83       NA's   :89      NA's   :89      NA's   :193     
##  daily_max_gust 
##  Min.   : 0.00  
##  1st Qu.:33.00  
##  Median :39.00  
##  Mean   :36.68  
##  3rd Qu.:46.00  
##  Max.   :94.00  
##  NA's   :1154
str(climate)
## tibble [2,556 × 9] (S3: tbl_df/tbl/data.frame)
##  $ date            : Date[1:2556], format: "2017-01-01" "2017-01-02" ...
##  $ year            : chr [1:2556] "2017" "2017" "2017" "2017" ...
##  $ month           : chr [1:2556] "01" "01" "01" "01" ...
##  $ day             : chr [1:2556] "01" "02" "03" "04" ...
##  $ daily_min_temp  : num [1:2556] 0 -0.4 -1.7 -1.1 -2.5 -0.7 1.1 1.6 3.4 -0.4 ...
##  $ daily_max_temp  : num [1:2556] 4.9 2.2 0.2 4 3.2 4.6 4.9 5.4 7.1 5 ...
##  $ daily_mean_temp : num [1:2556] 2.5 0.9 -0.8 1.5 0.4 2 3 3.5 5.3 2.3 ...
##  $ daily_tot_precip: num [1:2556] NA NA NA NA NA NA 0 1.8 5.6 NA ...
##  $ daily_max_gust  : num [1:2556] 57 61 57 56 0 0 0 43 61 74 ...

Visualization to confirm that daily recorded values make sense when compared across years.

# daily mean temperature
ggplot(
  # add data
  data = climate, 
  mapping = aes(x = month, y = daily_mean_temp, na.rm = TRUE)) + 
  # scatterplot
  geom_point() +
  # label
  labs(
    x = "Month",
    y = "Daily Mean Temperature (°C)",
    title = "Daily Mean Temperature Summarized by Month (2017-2023)"
  ) +
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  # abbreviate month names on x-axis
  scale_x_discrete(labels = month.abb) + 
  # present in a grid
  facet_wrap(year ~ ., scales = 'free', ncol=4)

# daily minimum temperature
ggplot(
  # add data
  data = climate, 
  mapping = aes(x = month, y = daily_min_temp, na.rm = TRUE)) + 
  # scatterplot
  geom_point() +
  # label
  labs(
    x = "Month",
    y = "Daily Minimum Temperature (°C)",
    title = "Daily Minimum Temperature Summarized by Month (2017-2023)"
  ) +
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  # abbreviate month names on x-axis
  scale_x_discrete(labels = month.abb) + 
  # present in a grid
  facet_wrap(year ~ ., scales = 'free', ncol=4)

# daily maximum temperature
ggplot(
  # add data
  data = climate, 
  mapping = aes(x = month, y = daily_max_temp, na.rm = TRUE)) + 
  # scatterplot
  geom_point() +
  # label
  labs(
    x = "Month",
    y = "Daily Maximum Temperature (°C)",
    title = "Daily Maximum Temperature Summarized by Month (2017-2023)"
  ) +
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  # abbreviate month names on x-axis
  scale_x_discrete(labels = month.abb) + 
  # present in a grid
  facet_wrap(year ~ ., scales = 'free', ncol=4)

# daily total precipitation
ggplot(
  # add data
  data = climate, 
  mapping = aes(x = month, y = daily_tot_precip, na.rm = TRUE)) + 
  # scatterplot
  geom_point() +
  # label
  labs(
    x = "Month",
    y = "Daily Total Precipitation (mm)",
    title = "Daily Total Precipitation Summarized by Month (2017-2023)"
  ) +
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  # abbreviate month names on x-axis
  scale_x_discrete(labels = month.abb) + 
  # present in a grid
  facet_wrap(year ~ ., scales = 'free', ncol=4)

# daily maximum wind gust
ggplot(
  # add data
  data = climate, 
  mapping = aes(x = month, y = daily_max_gust, na.rm = TRUE)) + 
  # scatterplot
  geom_point() +
  # label
  labs(
    x = "Month",
    y = "Daily Maximum Wind Gust (km/hr)",
    title = "Daily Maximum Wind Gust Summarized by Month (2017-2023)"
  ) +
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  # abbreviate month names on x-axis
  scale_x_discrete(labels = month.abb) + 
  # present in a grid
  facet_wrap(year ~ ., scales = 'free', ncol=4)

Insect Trap Location Data

The insect trap location data file was created by Annika Meije and then Larissa Bron converted the location coordinates to degrees, minutes, seconds in QGis, resulting in Traps Data file (https://github.com/larissaissabron/MetchosinInsectBiomass/blob/main/Traps.csv).

Figure: The location of the Metchosin Insect Biomass Project within the spatial context Vancouver Island, the Salish Sea, and the west coast of British Columbia and Washington.

Figure: The 20 insect trap locations spread throughout the District of Metchosin.

The insect trap location data was loaded into R.

# Import .csv file of trap location data. 
traps <- read.csv("data/Traps.csv", fileEncoding = "UTF-8") 

# change format from dataframe to tibble to assist R in working with large dataset
as_tibble(traps)
## # A tibble: 20 × 3
##    trap_number latitude       longitude       
##          <int> <chr>          <chr>           
##  1           1 48°24′19.8701″ -123°33′21.3825″
##  2           2 48°23′47.2428″ -123°34′6.3267″ 
##  3           3 48°23′29.5436″ -123°34′34.1539″
##  4           4 48°22′46.7819″ -123°35′9.5595″ 
##  5           5 48°23′3.5567″  -123°33′16.2336″
##  6           6 48°22′11.3156″ -123°32′1.6113″ 
##  7           7 48°22′27.3518″ -123°32′17.4123″
##  8           8 48°22′16.1411″ -123°33′25.1513″
##  9           9 48°21′31.5438″ -123°34′35.4979″
## 10          10 48°22′22.8000″ -123°32′5.9999″ 
## 11          11 48°23′11.7224″ -123°32′30.0531″
## 12          12 48°21′16.8023″ -123°33′25.8695″
## 13          13 48°23′28.5940″ -123°32′57.0983″
## 14          14 48°22′21.3059″ -123°33′57.7102″
## 15          15 48°22′50.4664″ -123°31′26.8201″
## 16          16 48°24′59.9873″ -123°38′13.5073″
## 17          17 48°21′14.8168″ -123°32′48.1251″
## 18          18 48°22′14.4895″ -123°31′52.2609″
## 19          19 48°23′23.4576″ -123°30′14.2268″
## 20          20 48°22′45.8311″ -123°33′44.5944″
# change trap number to be a factor
traps$trap_number <- as.factor(traps$trap_number)

# Check out imported data to make sure nothing obviously wrong
summary(traps)
##   trap_number   latitude          longitude        
##  1      : 1   Length:20          Length:20         
##  2      : 1   Class :character   Class :character  
##  3      : 1   Mode  :character   Mode  :character  
##  4      : 1                                        
##  5      : 1                                        
##  6      : 1                                        
##  (Other):14
str(traps)
## 'data.frame':    20 obs. of  3 variables:
##  $ trap_number: Factor w/ 20 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ latitude   : chr  "48°24′19.8701″" "48°23′47.2428″" "48°23′29.5436″" "48°22′46.7819″" ...
##  $ longitude  : chr  "-123°33′21.3825″" "-123°34′6.3267″" "-123°34′34.1539″" "-123°35′9.5595″" ...

Land Cover Data

The Traps.csv file was uploaded to QGis along with the Capital Regional District 1m Resolution Land Cover Raster Data (https://www.crd.bc.ca/about/data/geospatial-data) to extract land cover at buffers of 50, 100, 250, and 500 metres around each insect trap

Steps:

  1. Mapping projection set as WGS 84/UTM Zone 10N to allow distance calculation using meters (instead of degrees).

  2. The symbology of the land cover data was updated in QGis to reflect the labels provided in the Land Cover Data Report (https://maps.crd.bc.ca/Media/Reports/Land%20Cover%202017-2019%20LiDAR%20Enhanced%20Version.zip).

  3. The land cover data layer was clipped near the Metchosin boundary to limit rendering and processing power required to work with the file.

  4. Circles with diameter 50, 100, 250, and 500 metres, “buffers,” were created around each insect trap using the Buffer tool in QGis.

  5. For each buffer, the Raster Calculator tool in QGis was used to calculate the number of pixels of each land cover category contained within each buffer, the total number of pixels within each buffer, and to aggregate separately the anthropogenic and natural features.

  6. The resulting .csv file for each buffer was then exported from QGis and imported into R.

The land cover data extracted at each buffer size in meters is available:

# Import .csv files for land cover data extracted at 50, 100, 250, and 500 metre buffers

# 50 meter buffer around each insect trap
landcover_50m <- read.csv("data/ZonalHistogram_50m.csv") %>% 
  # remove lat and long because redundant
  select(-2, -3) %>% 
  # rename land cover columns so they are easier to work with
  rename(
    trap_number = Id,
    exposed_soil = HISTO_7,
    grass = HISTO_8,
    herb = HISTO_9,
    shrub_small_tree = HISTO_10,
    deciduous_tree = HISTO_11,
    coniferous_tree = HISTO_12,
    pavement_packed_gravel = HISTO_14,
    road = HISTO_15,
    buildings = HISTO_16,
    agriculture = HISTO_17,
    exposed_bedrock = HISTO_18
    ) %>% 
  # add a column for the size of the buffer
  mutate(buffer = 50) %>% 
  # total number of pixels (each pixel is 1m x 1m, so it is equivalent to area)
  mutate(total_area = exposed_soil + grass + herb + shrub_small_tree + deciduous_tree + coniferous_tree + pavement_packed_gravel + road + buildings + agriculture + exposed_bedrock) %>% 
  # total natural 
  mutate(natural = exposed_soil + grass + herb + shrub_small_tree + deciduous_tree + coniferous_tree + exposed_bedrock) %>%
  # total anthropogenic
    mutate(anthropogenic = pavement_packed_gravel + road + buildings + agriculture)

# 100 meter buffer
landcover_100m <- read.csv("data/ZonalHistogram_100m.csv") %>% 
  # remove lat and long because redundant
  select(-2, -3) %>% 
  # rename landcover columns so they are easier to work with
  rename(
    trap_number = Id,
    ocean = HISTO_1,
    lake = HISTO_2,
    exposed_soil = HISTO_7,
    grass = HISTO_8,
    herb = HISTO_9,
    shrub_small_tree = HISTO_10,
    deciduous_tree = HISTO_11,
    coniferous_tree = HISTO_12,
    pavement_packed_gravel = HISTO_14,
    road = HISTO_15,
    buildings = HISTO_16,
    agriculture = HISTO_17,
    exposed_bedrock = HISTO_18,
    aq_vegetation = HISTO_22) %>% 
  # add a column for the size of the buffer
  mutate(buffer = 100) %>% 
  # total number of pixels (each pixel is 1m x 1m, so it is equivalent to area)
  mutate(total_area = ocean + lake + exposed_soil + grass + herb + shrub_small_tree + deciduous_tree + coniferous_tree + pavement_packed_gravel + road + buildings + agriculture + exposed_bedrock + aq_vegetation) %>% 
  # total natural 
  mutate(natural = ocean + lake + exposed_soil + grass + herb + shrub_small_tree + deciduous_tree + coniferous_tree + exposed_bedrock + aq_vegetation) %>%
  # total anthropogenic
    mutate(anthropogenic = pavement_packed_gravel + road + buildings + agriculture)

# 250 meter buffer
landcover_250m <- read.csv("data/ZonalHistogram_250m.csv") %>% 
  # remove lat and long because redundant
  select(-2, -3) %>% 
  # rename landcover columns so they are easier to work with
  rename(
    trap_number = Id,
    ocean = HISTO_1,
    lake = HISTO_2,
    sand_gravel_shore = HISTO_5,
    bedrock_shore = HISTO_6,
    exposed_soil = HISTO_7,
    grass = HISTO_8,
    herb = HISTO_9,
    shrub_small_tree = HISTO_10,
    deciduous_tree = HISTO_11,
    coniferous_tree = HISTO_12,
    docks = HISTO_13,
    pavement_packed_gravel = HISTO_14,
    road = HISTO_15,
    buildings = HISTO_16,
    agriculture = HISTO_17,
    exposed_bedrock = HISTO_18,
    aq_vegetation = HISTO_22) %>% 
  # add a column for the size of the buffer
  mutate(buffer = 250) %>% 
  # total number of pixels (each pixel is 1m x 1m, so it is equivalent to area)
  mutate(total_area = ocean + lake + sand_gravel_shore + bedrock_shore + exposed_soil + grass + herb + shrub_small_tree + deciduous_tree + coniferous_tree + docks + pavement_packed_gravel + road + buildings + agriculture + exposed_bedrock + aq_vegetation) %>% 
  # total natural 
  mutate(natural = ocean + lake + sand_gravel_shore + bedrock_shore + exposed_soil + grass + herb + shrub_small_tree + deciduous_tree + coniferous_tree + exposed_bedrock + aq_vegetation) %>%
  # total anthropogenic
    mutate(anthropogenic = docks + pavement_packed_gravel + road + buildings + agriculture)
  
# 500 meter buffer  
landcover_500m <- read.csv("data/ZonalHistogram_500m.csv") %>% 
    # remove lat and long because redundant
  select(-2, -3) %>% 
  # rename landcover columns so they are easier to work with
  rename(
    trap_number = Id,
    ocean = HISTO_1,
    lake = HISTO_2,
    sand_gravel_shore = HISTO_5,
    bedrock_shore = HISTO_6,
    exposed_soil = HISTO_7,
    grass = HISTO_8,
    herb = HISTO_9,
    shrub_small_tree = HISTO_10,
    deciduous_tree = HISTO_11,
    coniferous_tree = HISTO_12,
    docks = HISTO_13,
    pavement_packed_gravel = HISTO_14,
    road = HISTO_15,
    buildings = HISTO_16,
    agriculture = HISTO_17,
    exposed_bedrock = HISTO_18,
    aq_vegetation = HISTO_22,
    loose_gravel = HISTO_24) %>% 
  # add a column for the size of the buffer
  mutate(buffer = 500) %>% 
  # total number of pixels (each pixel is 1m x 1m, so it is equivalent to area)
  mutate(total_area = ocean + lake + sand_gravel_shore + bedrock_shore + exposed_soil + grass + herb + shrub_small_tree + deciduous_tree + coniferous_tree + docks + pavement_packed_gravel + road + buildings + agriculture + exposed_bedrock + aq_vegetation + loose_gravel) %>% 
  # total natural 
  mutate(natural = ocean + lake + sand_gravel_shore + bedrock_shore + exposed_soil + grass + herb + shrub_small_tree + deciduous_tree + coniferous_tree + exposed_bedrock + aq_vegetation + loose_gravel) %>%
  # total anthropogenic
    mutate(anthropogenic = docks + pavement_packed_gravel + road + buildings + agriculture)

# join together the landcover data
landcover <- bind_rows(landcover_50m, landcover_100m, landcover_250m, landcover_500m)

# adjust the landcover data
landcover <- landcover %>% 
  # Make all NAs to 0s
  replace(is.na(landcover), 0) %>% 
  relocate(buffer, .after = trap_number) %>% 
  relocate(total_area, .after = buffer) 

# update trap number to be a factor
landcover$trap_number <- as.factor(landcover$trap_number)

# Check out imported data to make sure nothing obviously wrong
summary(landcover)
##   trap_number     buffer        total_area      exposed_soil     
##  1      : 4   Min.   : 50.0   Min.   :  7845   Min.   :    0.00  
##  2      : 4   1st Qu.: 87.5   1st Qu.: 25500   1st Qu.:   83.75  
##  3      : 4   Median :175.0   Median :113800   Median :  788.50  
##  4      : 4   Mean   :225.0   Mean   :255088   Mean   : 3236.19  
##  5      : 4   3rd Qu.:312.5   3rd Qu.:343394   3rd Qu.: 3621.00  
##  6      : 4   Max.   :500.0   Max.   :784915   Max.   :25191.00  
##  (Other):56                                                      
##      grass             herb         shrub_small_tree deciduous_tree  
##  Min.   :     0   Min.   :    0.0   Min.   :   43    Min.   :   804  
##  1st Qu.:   897   1st Qu.:  563.5   1st Qu.: 1040    1st Qu.:  3551  
##  Median :  5440   Median : 3316.5   Median : 4298    Median : 18979  
##  Mean   : 19933   Mean   : 9525.3   Mean   :15420    Mean   : 47954  
##  3rd Qu.: 28025   3rd Qu.:14470.8   3rd Qu.:21172    3rd Qu.: 71200  
##  Max.   :146646   Max.   :46590.0   Max.   :70352    Max.   :230305  
##                                                                      
##  coniferous_tree  pavement_packed_gravel      road         buildings      
##  Min.   :  1154   Min.   :    0.0        Min.   :    0   Min.   :    0.0  
##  1st Qu.:  5870   1st Qu.:  179.8        1st Qu.:    0   1st Qu.:  113.8  
##  Median : 26842   Median :  750.0        Median : 1368   Median :  651.5  
##  Mean   :106498   Mean   : 3697.6        Mean   : 3427   Mean   : 3245.7  
##  3rd Qu.:131309   3rd Qu.: 5273.0        3rd Qu.: 4674   3rd Qu.: 3595.0  
##  Max.   :549144   Max.   :34899.0        Max.   :25632   Max.   :37996.0  
##                                                                           
##   agriculture     exposed_bedrock      natural       anthropogenic   
##  Min.   :     0   Min.   :    0.0   Min.   :  3020   Min.   :     0  
##  1st Qu.:     0   1st Qu.:    0.0   1st Qu.: 13826   1st Qu.:  1096  
##  Median :     0   Median :    0.0   Median : 63030   Median :  7594  
##  Mean   : 29931   Mean   :  431.3   Mean   :214785   Mean   : 40303  
##  3rd Qu.: 11856   3rd Qu.:   80.0   3rd Qu.:246791   3rd Qu.: 32582  
##  Max.   :345214   Max.   :22785.0   Max.   :777399   Max.   :383525  
##                                                                      
##      ocean             lake         aq_vegetation    sand_gravel_shore
##  Min.   :     0   Min.   :    0.0   Min.   :  0.00   Min.   :    0    
##  1st Qu.:     0   1st Qu.:    0.0   1st Qu.:  0.00   1st Qu.:    0    
##  Median :     0   Median :    0.0   Median :  0.00   Median :    0    
##  Mean   :  8600   Mean   : 1532.1   Mean   : 43.50   Mean   : 1264    
##  3rd Qu.:     0   3rd Qu.:  900.8   3rd Qu.:  0.25   3rd Qu.:    0    
##  Max.   :218199   Max.   :28164.0   Max.   :906.00   Max.   :26665    
##                                                                       
##  bedrock_shore         docks         loose_gravel    
##  Min.   :    0.0   Min.   : 0.000   Min.   :   0.00  
##  1st Qu.:    0.0   1st Qu.: 0.000   1st Qu.:   0.00  
##  Median :    0.0   Median : 0.000   Median :   0.00  
##  Mean   :  334.2   Mean   : 1.837   Mean   :  12.65  
##  3rd Qu.:    0.0   3rd Qu.: 0.000   3rd Qu.:   0.00  
##  Max.   :15989.0   Max.   :85.000   Max.   :1012.00  
## 
str(landcover)
## 'data.frame':    80 obs. of  23 variables:
##  $ trap_number           : Factor w/ 20 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ buffer                : num  50 50 50 50 50 50 50 50 50 50 ...
##  $ total_area            : int  7851 7845 7851 7849 7854 7852 7846 7850 7855 7845 ...
##  $ exposed_soil          : int  10 125 140 5 7 0 3 37 194 22 ...
##  $ grass                 : int  148 0 127 379 209 619 2411 472 1622 609 ...
##  $ herb                  : int  70 102 647 45 177 319 580 514 258 422 ...
##  $ shrub_small_tree      : int  190 580 1052 359 253 380 604 839 926 645 ...
##  $ deciduous_tree        : int  1505 1323 1308 804 2190 1203 1829 1297 1039 1082 ...
##  $ coniferous_tree       : int  5928 5697 3857 6257 5018 3029 2121 4442 2720 3437 ...
##  $ pavement_packed_gravel: int  0 9 130 0 0 0 15 25 207 185 ...
##  $ road                  : int  0 0 524 0 0 0 0 0 680 365 ...
##  $ buildings             : int  0 0 9 0 0 0 178 217 209 0 ...
##  $ agriculture           : int  0 0 0 0 0 2302 105 0 0 1078 ...
##  $ exposed_bedrock       : int  0 9 57 0 0 0 0 7 0 0 ...
##  $ natural               : int  7851 7836 7188 7849 7854 5550 7548 7608 6759 6217 ...
##  $ anthropogenic         : int  0 9 663 0 0 2302 298 242 1096 1628 ...
##  $ ocean                 : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ lake                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ aq_vegetation         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ sand_gravel_shore     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ bedrock_shore         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ docks                 : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ loose_gravel          : num  0 0 0 0 0 0 0 0 0 0 ...

Visualization to confirm that the land cover data looks like it is spread across the insect trap sites at all four buffer sizes.

# Create custom colour pallet for land cover types
landcover_colors <- c(
  "exposed_soil" = "#6e554a",
  "grass" = "#5c8a0e",
  "herb" = "#aae651",
  "shrub_small_tree" = "#539f6b",
  "deciduous_tree" = "#b6b639",
  "coniferous_tree" = "#0d3e05",
  "pavement_packed_gravel" = "#7d9693",
  "road" = "#565860",
  "buildings" = "#3e392e",
  "agriculture" = "#df974d",
  "exposed_bedrock" = "#b0b2b2",
  "ocean" = "#2b37df",
  "lake" = "#1899c8",
  "aq_vegetation" = "#81c298",
  "sand_gravel_shore" = "#dddd82",
  "bedrock_shore" = "#60657b",
  "docks" = "#1a1e1b",
  "loose_gravel" = "#646462"
)

# Create custom names for land cover labelling
landcover_names <- c(
  "exposed_soil" = "Exposed Soil",
  "grass" = "Grass",
  "herb" = "Herb",
  "shrub_small_tree" = "Shrubs",
  "deciduous_tree" = "Deciduous Tree",
  "coniferous_tree" = "Coniferous Tree",
  "pavement_packed_gravel" = "Pavement",
  "road" = "Road",
  "buildings" = "Buildings",
  "agriculture" = "Agriculture",
  "exposed_bedrock" = "Exposed Bedrock",
  "ocean" = "Ocean",
  "lake" = "Lake",
  "aq_vegetation" = "Aquatic Vegetation",
  "sand_gravel_shore" = "Sand Shoreline",
  "bedrock_shore" = "Bedrock Shoreline",
  "docks" = "Docks",
  "loose_gravel" = "Loose Gravel"
)

# 50m buffer 

# Create a visualization dataframe for 50m buffer
landcover_50_vis <- landcover %>% 
  # select landcover extracted with a 50m buffer
  filter(buffer == 50) %>% 
  # remove buffer size, total number of pixels, anthropogenic aggregation, natural aggregation
  select(-2, -3, -15, -16) %>% 
  # exchange x and y values
  pivot_longer(
    cols = -trap_number,
    names_to = "landcover",
    values_to = "meters" 
  ) %>% 
  # change to factors so they are visualized appropriately 
  mutate_at(c('landcover'), as.factor) %>% 
  mutate_at(c('trap_number'), as.factor)

# Plot the coverage of land cover types in a 50m buffer around each trap 
ggplot(
  # Add data
  landcover_50_vis, aes(x = trap_number, y = meters, fill = landcover)) +
  # bar plot
  geom_bar(stat = "identity", position = "stack") +
  # custom colours and names for legend based on landcover values
  scale_fill_manual(values = landcover_colors,
                    labels = landcover_names) + 
  # label
  labs(
    x = "Trap Number",
    y = "Area (m<sup>2</sup>)",
    fill = "Land Cover",
    title = "Land Cover Within a 50 Metre Buffer of Each Insect Trap") +
   ylab(bquote('Area'~(m^2))) +
  # move legend to the bottom and make it less awful
   theme(
    legend.position = "bottom",
    legend.justification = "left",
    legend.text = element_text(size = 8),
    legend.title = element_text(angle = 90, vjust = 0.5, hjust = 0.5)) +
  guides(fill=guide_legend(ncol=5))

# 100m buffer

# Create a visualization dataframe for 100m buffer
landcover_100_vis <- landcover %>% 
  # select landcover extracted with a 100m buffer
  filter(buffer == 100) %>% 
  # remove buffer size, total number of pixels, anthropogenic aggregation, natural aggregation
  select(-2, -3, -15, -16) %>% 
  # exchange x and y values
  pivot_longer(
    cols = -trap_number,
    names_to = "landcover",
    values_to = "meters" 
  ) %>% 
  # change to factors so they are visualized appropriately 
  mutate_at(c('landcover'), as.factor) %>% 
  mutate_at(c('trap_number'), as.factor)

# Plot the coverage of land cover types in a 100m buffer around each trap 
ggplot(
  # Add data
  landcover_100_vis, aes(x = trap_number, y = meters, fill = landcover)) +
  # bar plot
  geom_bar(stat = "identity", position = "stack") +
  # custom colours and names for legend based on landcover values
  scale_fill_manual(values = landcover_colors,
                    labels = landcover_names) + 
  # label
  labs(
    x = "Trap Number",
    y = "Area (m<sup>2</sup>)",
    fill = "Land Cover",
    title = "Land Cover Within a 100 Metre Buffer of Each Insect Trap") +
   ylab(bquote('Area'~(m^2))) +
  # move legend to the bottom and make it less awful
   theme(
    legend.position = "bottom",
    legend.justification = "left",
    legend.text = element_text(size = 8),
    legend.title = element_text(angle = 90, vjust = 0.5, hjust = 0.5)) +
  guides(fill=guide_legend(ncol=5))

# 250m buffer

# Create a visualization dataframe for 250m buffer
landcover_250_vis <- landcover %>% 
  # select landcover extracted with a 250m buffer
  filter(buffer == 250) %>% 
  # remove buffer size, total number of pixels, anthropogenic aggregation, natural aggregation
  select(-2, -3, -15, -16) %>% 
  # exchange x and y values
  pivot_longer(
    cols = -trap_number,
    names_to = "landcover",
    values_to = "meters" 
  ) %>% 
  # change to factors so they are visualized appropriately 
  mutate_at(c('landcover'), as.factor) %>% 
  mutate_at(c('trap_number'), as.factor)

# Plot the coverage of land cover types in a 250m buffer around each trap 
ggplot(
  # Add data
  landcover_250_vis, aes(x = trap_number, y = meters, fill = landcover)) +
  # bar plot
  geom_bar(stat = "identity", position = "stack") +
  # custom colours and names for legend based on landcover values
  scale_fill_manual(values = landcover_colors,
                    labels = landcover_names) + 
  # label
  labs(
    x = "Trap Number",
    y = "Area (m<sup>2</sup>)",
    fill = "Land Cover",
    title = "Land Cover Within a 250 Metre Buffer of Each Insect Trap") +
   ylab(bquote('Area'~(m^2))) +
  # move legend to the bottom and make it less awful
   theme(
    legend.position = "bottom",
    legend.justification = "left",
    legend.text = element_text(size = 8),
    legend.title = element_text(angle = 90, vjust = 0.5, hjust = 0.5)) +
  guides(fill=guide_legend(ncol=5))

# 500m buffer

# Create a visualization dataframe for 500m buffer
landcover_500_vis <- landcover %>% 
  # select landcover extracted with a 500m buffer
  filter(buffer == 500) %>% 
  # remove buffer size, total number of pixels, anthropogenic aggregation, natural aggregation
  select(-2, -3, -15, -16) %>% 
  # exchange x and y values
  pivot_longer(
    cols = -trap_number,
    names_to = "landcover",
    values_to = "meters" 
  ) %>% 
  # change to factors so they are visualized appropriately 
  mutate_at(c('landcover'), as.factor) %>% 
  mutate_at(c('trap_number'), as.factor)

# Plot the coverage of land cover types in a 500m buffer around each trap 
ggplot(
  # Add data
  landcover_500_vis, aes(x = trap_number, y = meters, fill = landcover)) +
  # bar plot
  geom_bar(stat = "identity", position = "stack") +
  # custom colours and names for legend based on landcover values
  scale_fill_manual(values = landcover_colors,
                    labels = landcover_names) + 
  # label
  labs(
    x = "Trap Number",
    y = "Area (m<sup>2</sup>)",
    fill = "Land Cover",
    title = "Land Cover Within a 500 Metre Buffer of Each Insect Trap") +
   ylab(bquote('Area'~(m^2))) +
  # move legend to the bottom and make it less awful
   theme(
    legend.position = "bottom",
    legend.justification = "left",
    legend.text = element_text(size = 8),
    legend.title = element_text(angle = 90, vjust = 0.5, hjust = 0.5)) +
  guides(fill=guide_legend(ncol=5))

Daily Climate Data Aggregations

1. Biweekly

For each biweekly sampling interval, I calculated the mean of (1) Minimum temperature, (2) Maximum temperature, (3) Mean temperature, (4) Precipitation, and (5) Max wind gust - all of the relevant variables available from Environment Canada. This aggregation is similar to Müller et al. (2024). Additionally, I added total precipitation calculated per sampling window (Müller et. al, 2024; Seibold et. al, 2019).

# Function to aggregate daily climate data by the date range of our biweekly sampling intervals (ChatGPT code that I debugged. Used ChatGPT because I didn't know the solution to this problem was making a function).
aggregate_climate_data <- function(start_date, end_date) {
  climate %>%
    filter(date >= start_date & date <= end_date) %>%
    summarize(
      sampling_mean_min_temp = mean(daily_min_temp, na.rm = TRUE),
      sampling_mean_max_temp = mean(daily_max_temp, na.rm = TRUE),
      sampling_mean_mean_temp = mean(daily_mean_temp, na.rm = TRUE),
      sampling_mean_precipitation = mean(daily_tot_precip, na.rm = TRUE),
      sampling_mean_max_gust = mean(daily_max_gust, na.rm = TRUE),
      sampling_tot_precipitation = sum(daily_tot_precip, na.rm = TRUE)
    )
}

# Apply the function for each date range in insect_counts and add the results as new columns
aggregated_climate <- map2_dfr(insect_counts$range_begin, insect_counts$range_end, aggregate_climate_data)

# Add the aggregated biweekly sampling interval climate data as columns to the insect_counts dataframe
insect_counts <- bind_cols(insect_counts, aggregated_climate)

# Review additions to check that they make sense.
str(insect_counts)
## tibble [6,078 × 21] (S3: tbl_df/tbl/data.frame)
##  $ trap_number                : Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 1 1 2 2 2 2 ...
##  $ year                       : num [1:6078] 2018 2018 2018 2018 2018 ...
##  $ month                      : chr [1:6078] "(July 6 - 20)" "(July 6 - 20)" "(July 6 - 20)" "(July 6 - 20)" ...
##  $ range_begin                : Date[1:6078], format: "2018-07-06" "2018-07-06" ...
##  $ range_end                  : Date[1:6078], format: "2018-07-20" "2018-07-20" ...
##  $ collection_date            : Date[1:6078], format: "2018-07-20" "2018-07-20" ...
##  $ collect_year               : num [1:6078] 2018 2018 2018 2018 2018 ...
##  $ collect_month              : num [1:6078] 7 7 7 7 7 7 7 7 7 7 ...
##  $ collect_day                : num [1:6078] 20 20 20 20 20 20 20 20 20 20 ...
##  $ group                      : chr [1:6078] "Hymenoptera" "Diptera" "Lepidoptera" "Hemiptera" ...
##  $ biomass                    : num [1:6078] 1.18 1.83 5.48 0.06 0.56 0.04 2.69 1.76 4.77 0.2 ...
##  $ biomass_total              : num [1:6078] 9.15 9.15 9.15 9.15 9.15 ...
##  $ percent_biomass            : num [1:6078] 12.9 20.01 59.87 0.7 6.13 ...
##  $ individual_count           : num [1:6078] 127 476 248 58 40 97 257 244 269 80 ...
##  $ count_total                : num [1:6078] 1046 1046 1046 1046 1046 ...
##  $ sampling_mean_min_temp     : num [1:6078] 12.8 12.8 12.8 12.8 12.8 ...
##  $ sampling_mean_max_temp     : num [1:6078] 20.7 20.7 20.7 20.7 20.7 ...
##  $ sampling_mean_mean_temp    : num [1:6078] 16.7 16.7 16.7 16.7 16.7 ...
##  $ sampling_mean_precipitation: num [1:6078] 0 0 0 0 0 0 0 0 0 0 ...
##  $ sampling_mean_max_gust     : num [1:6078] 39 39 39 39 39 39 39 39 39 39 ...
##  $ sampling_tot_precipitation : num [1:6078] 0 0 0 0 0 0 0 0 0 0 ...
summary(insect_counts)
##   trap_number        year         month            range_begin        
##  14     : 414   Min.   :2018   Length:6078        Min.   :2018-07-06  
##  9      : 408   1st Qu.:2019   Class :character   1st Qu.:2019-08-08  
##  12     : 408   Median :2020   Mode  :character   Median :2020-09-01  
##  13     : 408   Mean   :2020                      Mean   :2020-11-28  
##  4      : 402   3rd Qu.:2022                      3rd Qu.:2022-05-01  
##  5      : 396   Max.   :2023                      Max.   :2023-10-02  
##  (Other):3642                                                         
##    range_end          collection_date       collect_year  collect_month   
##  Min.   :2018-07-20   Min.   :2018-07-20   Min.   :2018   Min.   : 5.000  
##  1st Qu.:2019-08-21   1st Qu.:2019-08-21   1st Qu.:2019   1st Qu.: 6.000  
##  Median :2020-09-14   Median :2020-09-14   Median :2020   Median : 8.000  
##  Mean   :2020-12-10   Mean   :2020-12-11   Mean   :2020   Mean   : 7.694  
##  3rd Qu.:2022-05-14   3rd Qu.:2022-05-14   3rd Qu.:2022   3rd Qu.: 9.000  
##  Max.   :2023-10-15   Max.   :2023-10-15   Max.   :2023   Max.   :10.000  
##                                                                           
##   collect_day       group              biomass       biomass_total   
##  Min.   : 1.00   Length:6078        Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 9.00   Class :character   1st Qu.: 0.130   1st Qu.: 1.610  
##  Median :16.00   Mode  :character   Median : 0.380   Median : 3.830  
##  Mean   :16.19                      Mean   : 1.004   Mean   : 6.026  
##  3rd Qu.:24.00                      3rd Qu.: 1.050   3rd Qu.: 8.150  
##  Max.   :31.00                      Max.   :35.820   Max.   :43.460  
##                                                                      
##  percent_biomass individual_count  count_total   sampling_mean_min_temp
##  Min.   : 0.00   Min.   :   0.0   Min.   :   0   Min.   : 6.50         
##  1st Qu.: 4.55   1st Qu.:  32.0   1st Qu.: 659   1st Qu.:10.20         
##  Median :11.85   Median :  99.0   Median :1258   Median :11.62         
##  Mean   :16.67   Mean   : 281.9   Mean   :1690   Mean   :11.20         
##  3rd Qu.:24.94   3rd Qu.: 288.0   3rd Qu.:2173   3rd Qu.:12.81         
##  Max.   :93.78   Max.   :8615.0   Max.   :9827   Max.   :13.55         
##  NA's   :18                                                            
##  sampling_mean_max_temp sampling_mean_mean_temp sampling_mean_precipitation
##  Min.   :11.69          Min.   : 9.207          Min.   :0.00000            
##  1st Qu.:16.27          1st Qu.:13.279          1st Qu.:0.04615            
##  Median :18.19          Median :14.929          Median :0.44286            
##  Mean   :18.14          Mean   :14.670          Mean   :0.90678            
##  3rd Qu.:20.11          3rd Qu.:16.629          3rd Qu.:1.22857            
##  Max.   :23.96          Max.   :18.750          Max.   :4.54286            
##                                                                            
##  sampling_mean_max_gust sampling_tot_precipitation
##  Min.   :34.40          Min.   : 0.00             
##  1st Qu.:38.33          1st Qu.: 0.60             
##  Median :41.30          Median : 6.20             
##  Mean   :41.16          Mean   :12.54             
##  3rd Qu.:43.33          3rd Qu.:17.20             
##  Max.   :57.00          Max.   :63.60             
## 

Visualization to confirm that climate data aggregated at the level of biweekly sampling interval is appropriately distributed in time.

# plot of the sampling interval mean minimum temperature grouped by months and compared between years 
ggplot(
  # input data
  data = insect_counts, 
  mapping = aes(x = collect_month, y = sampling_mean_min_temp)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Month",
    y = "Mean Minimum Temperature (°C) of the Sampling Intervals",
    title = "The Mean Minimum Temperature of the Sampling Intervals Compared\nMonthly Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") +
  # labelling x axis with abbreviated month names
  scale_x_continuous(
    breaks = unique(insect_counts$collect_month),  
    labels = month.abb[unique(insect_counts$collect_month)]
  ) +
  # present in a grid of years
  facet_wrap(collect_year ~ ., scales = 'free', ncol=4)

# plot of the sampling interval mean of the mean temperature grouped by months and compared between years 
ggplot(
  # input data
  data = insect_counts, 
  mapping = aes(x = collect_month, y = sampling_mean_mean_temp)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Month",
    y = "Mean of the Mean Temperature (°C) of the Sampling Intervals",
    title = "The Mean of the Mean Temperature of the Sampling Intervals Compared\nMonthly Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") +
  # labelling x axis with abbreviated month names
  scale_x_continuous(
    breaks = unique(insect_counts$collect_month),  
    labels = month.abb[unique(insect_counts$collect_month)]
  ) +
  # present in a grid of years
  facet_wrap(collect_year ~ ., scales = 'free', ncol=4)

# plot of the sampling interval mean of the maximum temperature grouped by months and compared between years 
ggplot(
  # input data
  data = insect_counts, 
  mapping = aes(x = collect_month, y = sampling_mean_max_temp)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Month",
    y = "Mean Maximum Temperature (°C) of the Sampling Intervals",
    title = "The Mean Maximum Temperature of the Sampling Intervals Compared\nMonthly Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") +
  # labelling x axis with abbreviated month names
  scale_x_continuous(
    breaks = unique(insect_counts$collect_month),  
    labels = month.abb[unique(insect_counts$collect_month)]
  ) +
  # present in a grid of years
  facet_wrap(collect_year ~ ., scales = 'free', ncol=4)

# plot of the sampling interval mean of the mean precipitation grouped by months and compared between years 
ggplot(
  # input data
  data = insect_counts, 
  mapping = aes(x = collect_month, y = sampling_mean_precipitation)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Month",
    y = "Mean Precipitation (mm) of the Sampling Intervals",
    title = "The Mean Precipitation of the Sampling Intervals Compared\nMonthly Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") +
  # labelling x axis with abbreviated month names
  scale_x_continuous(
    breaks = unique(insect_counts$collect_month),  
    labels = month.abb[unique(insect_counts$collect_month)]
  ) +
  # present in a grid of years
  facet_wrap(collect_year ~ ., scales = 'free', ncol=4)

# plot of the sampling interval total precipitation grouped by months and compared between years 
ggplot(
  # input data
  data = insect_counts, 
  mapping = aes(x = collect_month, y = sampling_tot_precipitation)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Month",
    y = "Mean Total Precipitation (mm) of the Sampling Intervals",
    title = "The Total Precipitation of the Sampling Intervals Compared\nMonthly Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") +
  # labelling x axis with abbreviated month names
  scale_x_continuous(
    breaks = unique(insect_counts$collect_month),  
    labels = month.abb[unique(insect_counts$collect_month)]
  ) +
  # present in a grid of years
  facet_wrap(collect_year ~ ., scales = 'free', ncol=4)

# plot of the sampling interval mean maximum wind gust grouped by months and compared between years 
ggplot(
  # input data
  data = insect_counts, 
  mapping = aes(x = collect_month, y = sampling_mean_max_gust)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Month",
    y = "Mean Maximum Wind Gust (km/h) of the Sampling Intervals",
    title = "The Mean Maximum Wind Gust of the Sampling Intervals Compared\nMonthly Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") +
  # labelling x axis with abbreviated month names
  scale_x_continuous(
    breaks = unique(insect_counts$collect_month),  
    labels = month.abb[unique(insect_counts$collect_month)]
  ) +
  # present in a grid of years
  facet_wrap(collect_year ~ ., scales = 'free', ncol=4)

2. Yearly

Calculated the mean temperature, number of frost days (temp <0°C), number of warm days (daily mean temperature >20°C), and also the precipitation sum for the winter, growing period, and year (Hallmann et al., 2017; Seibold et al., 2019; Uhler, 2021).

# climate variables summarized by year
climate_calcs_year <- climate %>% 
  group_by(year) %>% 
  summarize(year_mean_temp = mean(daily_mean_temp, na.rm = TRUE),
         year_days_below_zero = sum(daily_min_temp < 0, na.rm = TRUE),
         year_days_above_25 = sum(daily_max_temp > 25, na.rm = TRUE),
         year_tot_precip = sum(daily_tot_precip, na.rm = TRUE)) %>% 
  # remove extra row of NA, na, NA
  na.omit()

# change year to be numeric
climate_calcs_year$year <- as.numeric(climate_calcs_year$year)

# Review to make sense of year trends
str(climate_calcs_year)
## tibble [7 × 5] (S3: tbl_df/tbl/data.frame)
##  $ year                : num [1:7] 2017 2018 2019 2020 2021 ...
##  $ year_mean_temp      : num [1:7] 10.3 10.9 10.6 10.6 10.4 ...
##  $ year_days_below_zero: int [1:7] 17 5 13 7 14 18 7
##  $ year_days_above_25  : int [1:7] 3 2 1 9 13 9 8
##  $ year_tot_precip     : num [1:7] 618 632 628 862 862 ...
summary(climate_calcs_year)
##       year      year_mean_temp  year_days_below_zero year_days_above_25
##  Min.   :2017   Min.   :10.09   Min.   : 5.00        Min.   : 1.000    
##  1st Qu.:2018   1st Qu.:10.36   1st Qu.: 7.00        1st Qu.: 2.500    
##  Median :2020   Median :10.56   Median :13.00        Median : 8.000    
##  Mean   :2020   Mean   :10.55   Mean   :11.57        Mean   : 6.429    
##  3rd Qu.:2022   3rd Qu.:10.77   3rd Qu.:15.50        3rd Qu.: 9.000    
##  Max.   :2023   Max.   :10.91   Max.   :18.00        Max.   :13.000    
##  year_tot_precip
##  Min.   :410.2  
##  1st Qu.:570.9  
##  Median :628.0  
##  Mean   :648.0  
##  3rd Qu.:747.0  
##  Max.   :861.8

Visualization to confirm that climate data aggregated at the level of year is appropriately distributed in time.

# change year to be a factor so every year labelled on x-axis
climate_calcs_year$year <- as.factor(climate_calcs_year$year)

# year mean temperature plot
ggplot(
  # input data
  data = climate_calcs_year, 
  mapping = aes(x = year, y = year_mean_temp)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Year",
    y = "Mean Yearly Temperature (ºC)",
    title = "Mean Yearly Temperature Compared Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") +
  scale_x_discrete(labels = xlab)

# year days below zero
ggplot(
  # input data
  data = climate_calcs_year, 
  mapping = aes(x = year, y = year_days_below_zero)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Year",
    y = "Number of Days Below 0°C",
    title = "Number of Days Below 0°C Compared Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") 

# year days above 25
ggplot(
  # input data
  data = climate_calcs_year, 
  mapping = aes(x = year, y = year_days_above_25)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Year",
    y = "Number of Days Above 25°C",
    title = "Number of Days Above 25°C Compared Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") 

# total precipitation
ggplot(
  # input data
  data = climate_calcs_year, 
  mapping = aes(x = year, y = year_tot_precip)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Year",
    y = "Total Precipitation (mm)",
    title = "Total Precipitation Compared Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") 

# put year back to numeric for calculations
climate_calcs_year$year <- c("2017", "2018", "2019", "2020", "2021", "2022", "2023")
climate_calcs_year$year <- as.numeric(climate_calcs_year$year)

3. Monthly

Preparation for final aggregations of climate data. Welti et al. (2022) incorporated monthly means for maximum and minimum temperature, as well as monthly total precipitation.

# climate variables summarized by month 
climate_calcs_month <-  climate %>% 
  group_by(year, month) %>% 
    summarise(month_tot_precip = sum(daily_tot_precip, na.rm = TRUE),
              month_mean_temp = mean(daily_mean_temp, na.rm = TRUE))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
# change year and month to be numeric
climate_calcs_month$month <- as.numeric(climate_calcs_month$month)
climate_calcs_month$year <- as.numeric(climate_calcs_month$year)

# check out how these are looking
str(climate_calcs_month)
## gropd_df [84 × 4] (S3: grouped_df/tbl_df/tbl/data.frame)
##  $ year            : num [1:84] 2017 2017 2017 2017 2017 ...
##  $ month           : num [1:84] 1 2 3 4 5 6 7 8 9 10 ...
##  $ month_tot_precip: num [1:84] 37.8 44.2 79.2 37.2 20.4 ...
##  $ month_mean_temp : num [1:84] 3.95 4.53 7.02 9.61 12.27 ...
##  - attr(*, "groups")= tibble [7 × 2] (S3: tbl_df/tbl/data.frame)
##   ..$ year : num [1:7] 2017 2018 2019 2020 2021 ...
##   ..$ .rows: list<int> [1:7] 
##   .. ..$ : int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..$ : int [1:12] 13 14 15 16 17 18 19 20 21 22 ...
##   .. ..$ : int [1:12] 25 26 27 28 29 30 31 32 33 34 ...
##   .. ..$ : int [1:12] 37 38 39 40 41 42 43 44 45 46 ...
##   .. ..$ : int [1:12] 49 50 51 52 53 54 55 56 57 58 ...
##   .. ..$ : int [1:12] 61 62 63 64 65 66 67 68 69 70 ...
##   .. ..$ : int [1:12] 73 74 75 76 77 78 79 80 81 82 ...
##   .. ..@ ptype: int(0) 
##   ..- attr(*, ".drop")= logi TRUE
summary(climate_calcs_month)
##       year          month       month_tot_precip month_mean_temp 
##  Min.   :2017   Min.   : 1.00   Min.   :  0.00   Min.   : 2.521  
##  1st Qu.:2018   1st Qu.: 3.75   1st Qu.: 14.10   1st Qu.: 6.641  
##  Median :2020   Median : 6.50   Median : 36.40   Median :10.142  
##  Mean   :2020   Mean   : 6.50   Mean   : 54.00   Mean   :10.544  
##  3rd Qu.:2022   3rd Qu.: 9.25   3rd Qu.: 80.95   3rd Qu.:14.612  
##  Max.   :2023   Max.   :12.00   Max.   :240.20   Max.   :17.465  
##                                                  NA's   :1

Visualization to confirm that the distribution of these monthly variables makes sense.

# change month to be a factor so all labelled on x-axis
climate_calcs_month$month <- as.factor(climate_calcs_month$month)

# total precipitation
ggplot(
  # input data
  data = climate_calcs_month, 
  mapping = aes(x = month, y = month_tot_precip)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Month",
    y = "Total Precipitation (mm)",
    title = "Total Monthly Precipitation Compared Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") +
  # present in a grid of years
  facet_wrap(year ~ ., scales = 'free', ncol=4) +
  # abbreviate month names on x-axis
  scale_x_discrete(labels = month.abb) 

# mean temperature
ggplot(
  # input data
  data = climate_calcs_month, 
  mapping = aes(x = month, y = month_mean_temp)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Month",
    y = "Mean Temperature (°C)",
    title = "Mean Monthly Temperature Compared Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") +
  # present in a grid of years
  facet_wrap(year ~ ., scales = 'free', ncol= 4) +
  # abbreviate month names on x-axis
  scale_x_discrete(labels = month.abb) 

# change month to be a number again
climate_calcs_month$month <- as.numeric(climate_calcs_month$month)

4. Previous year winter

Previous year winter (November-February) precipitation and temperature (Hallmann et al., 2017; Seibold et al., 2019).

# lets start with the hardest one first, previous winter, which is nov and dec of previous year and jan, feb of current year (ChatGPT for coding support and debugged by me). 

prev_year_winter_precip <- climate_calcs_month %>%
  # Filter to include only the months November (11), December (12), January (1), and February (2)
  filter(month %in% c(11, 12, 1, 2)) %>%
  # Adjust year for Nov and Dec to be counted in the next year's total
  mutate(adjusted_year = ifelse(month %in% c(11, 12), year + 1, year)) %>%
  # Filter for Adjusted_Year in the range 2018–2023
  filter(adjusted_year >= 2018 & adjusted_year <= 2023) %>%
  # Group by Adjusted_Year
  group_by(adjusted_year) %>%
  # Summarize total precipitation for these winter months
  summarize(prev_year_winter_precip = sum(month_tot_precip, na.rm = TRUE)) %>%
  # Rename Adjusted_Year back to Year for merging
  rename(year = adjusted_year)

# Join this to the yearly climate calcs
climate_calcs_year <- climate_calcs_year %>%
  left_join(prev_year_winter_precip, by = "year")

# remove working dataframe
rm(prev_year_winter_precip)

### Now previous year winter temp

# Add a column for "prev_year_winter_temp"
prev_year_winter_temp <- climate_calcs_month %>%
  # Filter to include only the months November (11), December (12), January (1), and February (2)
  filter(month %in% c(11, 12, 1, 2)) %>%
  # Adjust year for Nov and Dec to be counted in the next year's total
  mutate(adjusted_year = ifelse(month %in% c(11, 12), year + 1, year)) %>%
  # Filter for Adjusted_Year in the range 2018–2023
  filter(adjusted_year >= 2018 & adjusted_year <= 2023) %>%
  # Group by Adjusted_Year
  group_by(adjusted_year) %>%
  # Summarize mean temperature for these winter months
  summarize(prev_year_winter_temp = mean(month_mean_temp, na.rm = TRUE)) %>%
  # Rename Adjusted_Year back to Year for merging
  rename(year = adjusted_year)

# Join this to the yearly climate calcs
climate_calcs_year <- climate_calcs_year %>%
  left_join(prev_year_winter_temp, by = "year")

# remove working dataframe
rm(prev_year_winter_temp)

# check out how this looks
str(climate_calcs_year)
## tibble [7 × 7] (S3: tbl_df/tbl/data.frame)
##  $ year                   : num [1:7] 2017 2018 2019 2020 2021 ...
##  $ year_mean_temp         : num [1:7] 10.3 10.9 10.6 10.6 10.4 ...
##  $ year_days_below_zero   : int [1:7] 17 5 13 7 14 18 7
##  $ year_days_above_25     : int [1:7] 3 2 1 9 13 9 8
##  $ year_tot_precip        : num [1:7] 618 632 628 862 862 ...
##  $ prev_year_winter_precip: num [1:7] NA 463 438 503 489 ...
##  $ prev_year_winter_temp  : num [1:7] NA 6.03 6.15 6.45 6.16 ...
summary(climate_calcs_year)
##       year      year_mean_temp  year_days_below_zero year_days_above_25
##  Min.   :2017   Min.   :10.09   Min.   : 5.00        Min.   : 1.000    
##  1st Qu.:2018   1st Qu.:10.36   1st Qu.: 7.00        1st Qu.: 2.500    
##  Median :2020   Median :10.56   Median :13.00        Median : 8.000    
##  Mean   :2020   Mean   :10.55   Mean   :11.57        Mean   : 6.429    
##  3rd Qu.:2022   3rd Qu.:10.77   3rd Qu.:15.50        3rd Qu.: 9.000    
##  Max.   :2023   Max.   :10.91   Max.   :18.00        Max.   :13.000    
##                                                                        
##  year_tot_precip prev_year_winter_precip prev_year_winter_temp
##  Min.   :410.2   Min.   :221.0           Min.   :5.175        
##  1st Qu.:570.9   1st Qu.:444.4           1st Qu.:5.673        
##  Median :628.0   Median :469.9           Median :6.092        
##  Mean   :648.0   Mean   :431.9           Mean   :5.922        
##  3rd Qu.:747.0   3rd Qu.:485.9           3rd Qu.:6.161        
##  Max.   :861.8   Max.   :503.4           Max.   :6.453        
##                  NA's   :1               NA's   :1

Visualization to confirm that the distribution of these previous year variables make sense.

# change year to be a factor so every year labelled on x-axis
climate_calcs_year$year <- as.factor(climate_calcs_year$year)

# previous year winter precipitation
climate_calcs_year %>% 
  ggplot(
  # input data
  data = climate_calcs_year, 
  mapping = aes(x = year, y = prev_year_winter_precip)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Year",
    y = "Total Precipitation (mm) of the Previous Year Winter",
    title = "Previous Winter Total Precipitation Compared Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none")

# previous year temperature
ggplot(
  # input data
  data = climate_calcs_year, 
  mapping = aes(x = year, y = prev_year_winter_temp)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Year",
    y = "Mean Temperature (°C) of the Previous Year Winter",
    title = "Previous Winter Mean Temperature Compared Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") 

# put year back to numeric for calculations
climate_calcs_year$year <- c("2017", "2018", "2019", "2020", "2021", "2022", "2023")
climate_calcs_year$year <- as.numeric(climate_calcs_year$year)

5. Current year growing season

As done by Seibold et al. (2019).

# calculate total precipitation during the growing season of the year
growing_szn_tot_precip <- climate_calcs_month %>% 
  group_by(year) %>% 
  filter(month %in% c(3, 4, 5, 6, 7, 8, 9, 10)) %>% 
  summarize(year_growing_szn_tot_precip = sum(month_tot_precip, na.rm = TRUE)) %>% 
  # remove extra row of NA, na, NA
  na.omit()

# join growing season total precipitation back to monthly climate calcs
climate_calcs_year <- climate_calcs_year %>%
  left_join(growing_szn_tot_precip) 

# remove working dataframe
rm(growing_szn_tot_precip)

# Add climate variables summarized by year to insect_counts
insect_counts <- insect_counts %>% 
  left_join(climate_calcs_year)

# check out how this looks
str(climate_calcs_year)
## tibble [7 × 8] (S3: tbl_df/tbl/data.frame)
##  $ year                       : num [1:7] 2017 2018 2019 2020 2021 ...
##  $ year_mean_temp             : num [1:7] 10.3 10.9 10.6 10.6 10.4 ...
##  $ year_days_below_zero       : int [1:7] 17 5 13 7 14 18 7
##  $ year_days_above_25         : int [1:7] 3 2 1 9 13 9 8
##  $ year_tot_precip            : num [1:7] 618 632 628 862 862 ...
##  $ prev_year_winter_precip    : num [1:7] NA 463 438 503 489 ...
##  $ prev_year_winter_temp      : num [1:7] NA 6.03 6.15 6.45 6.16 ...
##  $ year_growing_szn_tot_precip: num [1:7] 283 116 286 297 286 ...
summary(climate_calcs_year)
##       year      year_mean_temp  year_days_below_zero year_days_above_25
##  Min.   :2017   Min.   :10.09   Min.   : 5.00        Min.   : 1.000    
##  1st Qu.:2018   1st Qu.:10.36   1st Qu.: 7.00        1st Qu.: 2.500    
##  Median :2020   Median :10.56   Median :13.00        Median : 8.000    
##  Mean   :2020   Mean   :10.55   Mean   :11.57        Mean   : 6.429    
##  3rd Qu.:2022   3rd Qu.:10.77   3rd Qu.:15.50        3rd Qu.: 9.000    
##  Max.   :2023   Max.   :10.91   Max.   :18.00        Max.   :13.000    
##                                                                        
##  year_tot_precip prev_year_winter_precip prev_year_winter_temp
##  Min.   :410.2   Min.   :221.0           Min.   :5.175        
##  1st Qu.:570.9   1st Qu.:444.4           1st Qu.:5.673        
##  Median :628.0   Median :469.9           Median :6.092        
##  Mean   :648.0   Mean   :431.9           Mean   :5.922        
##  3rd Qu.:747.0   3rd Qu.:485.9           3rd Qu.:6.161        
##  Max.   :861.8   Max.   :503.4           Max.   :6.453        
##                  NA's   :1               NA's   :1            
##  year_growing_szn_tot_precip
##  Min.   :115.8              
##  1st Qu.:208.4              
##  Median :282.6              
##  Mean   :240.6              
##  3rd Qu.:286.0              
##  Max.   :297.2              
## 

Visualization to confirm precipitation during the growing season trends are logical.

# change year to be a factor so every year labelled on x-axis
climate_calcs_year$year <- as.factor(climate_calcs_year$year)

# precipitation during the growing season
ggplot(
  # input data
  data = climate_calcs_year, 
  mapping = aes(x = year, y = year_growing_szn_tot_precip)) +
  # scatterplot
  geom_point() +
  # label everything
  labs(
    x = "Year",
    y = "Total Precipitation (mm) During the Growing Season",
    title = "Total Precipitation During the Growing Season Compared Between Years"
  )  +
  # improve the look of the x-axis
  # angle text vertical
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "none") 

# put year back to numeric for calculations
climate_calcs_year$year <- c("2017", "2018", "2019", "2020", "2021", "2022", "2023")
climate_calcs_year$year <- as.numeric(climate_calcs_year$year)

Join Data Together into One Dataframe

# join location data to the working dataframe
insect_counts <- insect_counts %>% 
  left_join(traps)

# add landcover data to insect_counts
insect_counts <- insect_counts %>% 
  left_join(landcover, join_by(trap_number))

# rearrange data

# make list of column names in order
col_order <- c("trap_number", 
               "latitude",
               "longitude",
               "year",
               "month",
               "range_begin",
               "range_end",
               "collection_date",
               "collect_year",
               "collect_month",
               "collect_day",
               "group",
               "biomass",
               "biomass_total",
               "percent_biomass",
               "individual_count",
               "count_total",
               "sampling_mean_min_temp",
               "sampling_mean_max_temp", 
               "sampling_mean_mean_temp",
               "sampling_mean_precipitation",
               "sampling_mean_max_gust",
               "sampling_tot_precipitation",
               "year_mean_temp",
               "year_days_below_zero",
               "year_days_above_25",
               "year_tot_precip",
               "prev_year_winter_precip",
               "prev_year_winter_temp",
               "year_growing_szn_tot_precip",
               "buffer",
               "total_area",
               "exposed_soil",
               "grass",
               "herb",
               "shrub_small_tree",
               "deciduous_tree",
               "coniferous_tree",
               "pavement_packed_gravel",
               "road",
               "buildings",
               "agriculture",
               "exposed_bedrock",
               "ocean",
               "lake",
               "aq_vegetation",
               "sand_gravel_shore",
               "bedrock_shore",
               "docks",
               "loose_gravel",
               "natural",
               "anthropogenic")

# rearrange columns
insect_counts <- insect_counts[, col_order]

# Review additions to check that they make sense.
str(insect_counts)
## tibble [24,312 × 52] (S3: tbl_df/tbl/data.frame)
##  $ trap_number                : Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ latitude                   : chr [1:24312] "48°24′19.8701″" "48°24′19.8701″" "48°24′19.8701″" "48°24′19.8701″" ...
##  $ longitude                  : chr [1:24312] "-123°33′21.3825″" "-123°33′21.3825″" "-123°33′21.3825″" "-123°33′21.3825″" ...
##  $ year                       : num [1:24312] 2018 2018 2018 2018 2018 ...
##  $ month                      : chr [1:24312] "(July 6 - 20)" "(July 6 - 20)" "(July 6 - 20)" "(July 6 - 20)" ...
##  $ range_begin                : Date[1:24312], format: "2018-07-06" "2018-07-06" ...
##  $ range_end                  : Date[1:24312], format: "2018-07-20" "2018-07-20" ...
##  $ collection_date            : Date[1:24312], format: "2018-07-20" "2018-07-20" ...
##  $ collect_year               : num [1:24312] 2018 2018 2018 2018 2018 ...
##  $ collect_month              : num [1:24312] 7 7 7 7 7 7 7 7 7 7 ...
##  $ collect_day                : num [1:24312] 20 20 20 20 20 20 20 20 20 20 ...
##  $ group                      : chr [1:24312] "Hymenoptera" "Hymenoptera" "Hymenoptera" "Hymenoptera" ...
##  $ biomass                    : num [1:24312] 1.18 1.18 1.18 1.18 1.83 1.83 1.83 1.83 5.48 5.48 ...
##  $ biomass_total              : num [1:24312] 9.15 9.15 9.15 9.15 9.15 9.15 9.15 9.15 9.15 9.15 ...
##  $ percent_biomass            : num [1:24312] 12.9 12.9 12.9 12.9 20 ...
##  $ individual_count           : num [1:24312] 127 127 127 127 476 476 476 476 248 248 ...
##  $ count_total                : num [1:24312] 1046 1046 1046 1046 1046 ...
##  $ sampling_mean_min_temp     : num [1:24312] 12.8 12.8 12.8 12.8 12.8 ...
##  $ sampling_mean_max_temp     : num [1:24312] 20.7 20.7 20.7 20.7 20.7 ...
##  $ sampling_mean_mean_temp    : num [1:24312] 16.7 16.7 16.7 16.7 16.7 ...
##  $ sampling_mean_precipitation: num [1:24312] 0 0 0 0 0 0 0 0 0 0 ...
##  $ sampling_mean_max_gust     : num [1:24312] 39 39 39 39 39 39 39 39 39 39 ...
##  $ sampling_tot_precipitation : num [1:24312] 0 0 0 0 0 0 0 0 0 0 ...
##  $ year_mean_temp             : num [1:24312] 10.9 10.9 10.9 10.9 10.9 ...
##  $ year_days_below_zero       : int [1:24312] 5 5 5 5 5 5 5 5 5 5 ...
##  $ year_days_above_25         : int [1:24312] 2 2 2 2 2 2 2 2 2 2 ...
##  $ year_tot_precip            : num [1:24312] 632 632 632 632 632 ...
##  $ prev_year_winter_precip    : num [1:24312] 463 463 463 463 463 ...
##  $ prev_year_winter_temp      : num [1:24312] 6.03 6.03 6.03 6.03 6.03 ...
##  $ year_growing_szn_tot_precip: num [1:24312] 116 116 116 116 116 ...
##  $ buffer                     : num [1:24312] 50 100 250 500 50 100 250 500 50 100 ...
##  $ total_area                 : int [1:24312] 7851 31406 196233 784886 7851 31406 196233 784886 7851 31406 ...
##  $ exposed_soil               : int [1:24312] 10 131 2285 9803 10 131 2285 9803 10 131 ...
##  $ grass                      : int [1:24312] 148 846 12862 51910 148 846 12862 51910 148 846 ...
##  $ herb                       : int [1:24312] 70 218 6043 20399 70 218 6043 20399 70 218 ...
##  $ shrub_small_tree           : int [1:24312] 190 1105 14603 51117 190 1105 14603 51117 190 1105 ...
##  $ deciduous_tree             : int [1:24312] 1505 7138 38664 173481 1505 7138 38664 173481 1505 7138 ...
##  $ coniferous_tree            : int [1:24312] 5928 21521 112476 446210 5928 21521 112476 446210 5928 21521 ...
##  $ pavement_packed_gravel     : int [1:24312] 0 11 2036 9968 0 11 2036 9968 0 11 ...
##  $ road                       : int [1:24312] 0 0 4352 12024 0 0 4352 12024 0 0 ...
##  $ buildings                  : int [1:24312] 0 436 2837 8624 0 436 2837 8624 0 436 ...
##  $ agriculture                : int [1:24312] 0 0 0 0 0 0 0 0 0 0 ...
##  $ exposed_bedrock            : int [1:24312] 0 0 75 366 0 0 75 366 0 0 ...
##  $ ocean                      : num [1:24312] 0 0 0 0 0 0 0 0 0 0 ...
##  $ lake                       : num [1:24312] 0 0 0 984 0 0 0 984 0 0 ...
##  $ aq_vegetation              : num [1:24312] 0 0 0 0 0 0 0 0 0 0 ...
##  $ sand_gravel_shore          : num [1:24312] 0 0 0 0 0 0 0 0 0 0 ...
##  $ bedrock_shore              : num [1:24312] 0 0 0 0 0 0 0 0 0 0 ...
##  $ docks                      : num [1:24312] 0 0 0 0 0 0 0 0 0 0 ...
##  $ loose_gravel               : num [1:24312] 0 0 0 0 0 0 0 0 0 0 ...
##  $ natural                    : int [1:24312] 7851 30959 187008 754270 7851 30959 187008 754270 7851 30959 ...
##  $ anthropogenic              : int [1:24312] 0 447 9225 30616 0 447 9225 30616 0 447 ...
summary(insect_counts)
##   trap_number      latitude          longitude              year     
##  14     : 1656   Length:24312       Length:24312       Min.   :2018  
##  9      : 1632   Class :character   Class :character   1st Qu.:2019  
##  12     : 1632   Mode  :character   Mode  :character   Median :2020  
##  13     : 1632                                         Mean   :2020  
##  4      : 1608                                         3rd Qu.:2022  
##  5      : 1584                                         Max.   :2023  
##  (Other):14568                                                       
##     month            range_begin           range_end         
##  Length:24312       Min.   :2018-07-06   Min.   :2018-07-20  
##  Class :character   1st Qu.:2019-08-08   1st Qu.:2019-08-21  
##  Mode  :character   Median :2020-09-01   Median :2020-09-14  
##                     Mean   :2020-11-28   Mean   :2020-12-10  
##                     3rd Qu.:2022-05-01   3rd Qu.:2022-05-14  
##                     Max.   :2023-10-02   Max.   :2023-10-15  
##                                                              
##  collection_date       collect_year  collect_month     collect_day   
##  Min.   :2018-07-20   Min.   :2018   Min.   : 5.000   Min.   : 1.00  
##  1st Qu.:2019-08-21   1st Qu.:2019   1st Qu.: 6.000   1st Qu.: 9.00  
##  Median :2020-09-14   Median :2020   Median : 8.000   Median :16.00  
##  Mean   :2020-12-11   Mean   :2020   Mean   : 7.694   Mean   :16.19  
##  3rd Qu.:2022-05-14   3rd Qu.:2022   3rd Qu.: 9.000   3rd Qu.:24.00  
##  Max.   :2023-10-15   Max.   :2023   Max.   :10.000   Max.   :31.00  
##                                                                      
##     group              biomass       biomass_total    percent_biomass
##  Length:24312       Min.   : 0.000   Min.   : 0.000   Min.   : 0.00  
##  Class :character   1st Qu.: 0.130   1st Qu.: 1.610   1st Qu.: 4.55  
##  Mode  :character   Median : 0.380   Median : 3.830   Median :11.85  
##                     Mean   : 1.004   Mean   : 6.026   Mean   :16.67  
##                     3rd Qu.: 1.050   3rd Qu.: 8.150   3rd Qu.:24.94  
##                     Max.   :35.820   Max.   :43.460   Max.   :93.78  
##                                                       NA's   :72     
##  individual_count  count_total   sampling_mean_min_temp sampling_mean_max_temp
##  Min.   :   0.0   Min.   :   0   Min.   : 6.50          Min.   :11.69         
##  1st Qu.:  32.0   1st Qu.: 659   1st Qu.:10.20          1st Qu.:16.27         
##  Median :  99.0   Median :1258   Median :11.62          Median :18.19         
##  Mean   : 281.9   Mean   :1690   Mean   :11.20          Mean   :18.14         
##  3rd Qu.: 288.0   3rd Qu.:2173   3rd Qu.:12.81          3rd Qu.:20.11         
##  Max.   :8615.0   Max.   :9827   Max.   :13.55          Max.   :23.96         
##                                                                               
##  sampling_mean_mean_temp sampling_mean_precipitation sampling_mean_max_gust
##  Min.   : 9.207          Min.   :0.00000             Min.   :34.40         
##  1st Qu.:13.279          1st Qu.:0.04615             1st Qu.:38.33         
##  Median :14.929          Median :0.44286             Median :41.30         
##  Mean   :14.670          Mean   :0.90678             Mean   :41.16         
##  3rd Qu.:16.629          3rd Qu.:1.22857             3rd Qu.:43.33         
##  Max.   :18.750          Max.   :4.54286             Max.   :57.00         
##                                                                            
##  sampling_tot_precipitation year_mean_temp  year_days_below_zero
##  Min.   : 0.00              Min.   :10.09   Min.   : 5.00       
##  1st Qu.: 0.60              1st Qu.:10.44   1st Qu.: 7.00       
##  Median : 6.20              Median :10.56   Median :13.00       
##  Mean   :12.54              Mean   :10.58   Mean   :10.88       
##  3rd Qu.:17.20              3rd Qu.:10.65   3rd Qu.:14.00       
##  Max.   :63.60              Max.   :10.91   Max.   :18.00       
##                                                                 
##  year_days_above_25 year_tot_precip prev_year_winter_precip
##  Min.   : 1.000     Min.   :410.2   Min.   :221.0          
##  1st Qu.: 2.000     1st Qu.:524.0   1st Qu.:438.0          
##  Median : 9.000     Median :632.2   Median :476.4          
##  Mean   : 6.934     Mean   :681.2   Mean   :441.9          
##  3rd Qu.: 9.000     3rd Qu.:861.8   3rd Qu.:489.0          
##  Max.   :13.000     Max.   :861.8   Max.   :503.4          
##                                                            
##  prev_year_winter_temp year_growing_szn_tot_precip     buffer     
##  Min.   :5.175         Min.   :115.8               Min.   : 50.0  
##  1st Qu.:5.554         1st Qu.:266.0               1st Qu.: 87.5  
##  Median :6.154         Median :286.0               Median :175.0  
##  Mean   :6.006         Mean   :249.1               Mean   :225.0  
##  3rd Qu.:6.164         3rd Qu.:286.0               3rd Qu.:312.5  
##  Max.   :6.453         Max.   :297.2               Max.   :500.0  
##                                                                   
##    total_area      exposed_soil       grass             herb      
##  Min.   :  7845   Min.   :    0   Min.   :     0   Min.   :    0  
##  1st Qu.: 25500   1st Qu.:   97   1st Qu.:   980   1st Qu.:  541  
##  Median :113800   Median :  794   Median :  6273   Median : 3037  
##  Mean   :255088   Mean   : 3642   Mean   : 20540   Mean   : 9422  
##  3rd Qu.:343394   3rd Qu.: 5581   3rd Qu.: 32180   3rd Qu.:14363  
##  Max.   :784915   Max.   :25191   Max.   :146646   Max.   :46590  
##                                                                   
##  shrub_small_tree deciduous_tree   coniferous_tree  pavement_packed_gravel
##  Min.   :   43    Min.   :   804   Min.   :  1154   Min.   :    0         
##  1st Qu.: 1050    1st Qu.:  3378   1st Qu.:  5697   1st Qu.:  130         
##  Median : 5244    Median : 18979   Median : 26842   Median :  841         
##  Mean   :16029    Mean   : 48624   Mean   :105383   Mean   : 3731         
##  3rd Qu.:23167    3rd Qu.: 71200   3rd Qu.:131309   3rd Qu.: 5480         
##  Max.   :70352    Max.   :230305   Max.   :549144   Max.   :34899         
##                                                                           
##       road         buildings      agriculture     exposed_bedrock
##  Min.   :    0   Min.   :    0   Min.   :     0   Min.   :    0  
##  1st Qu.:    0   1st Qu.:   62   1st Qu.:     0   1st Qu.:    0  
##  Median :  892   Median :  617   Median :     0   Median :    0  
##  Mean   : 3434   Mean   : 3137   Mean   : 27881   Mean   :  533  
##  3rd Qu.: 4591   3rd Qu.: 3775   3rd Qu.: 11852   3rd Qu.:  105  
##  Max.   :25632   Max.   :37996   Max.   :345214   Max.   :22785  
##                                                                  
##      ocean             lake         aq_vegetation    sand_gravel_shore
##  Min.   :     0   Min.   :    0.0   Min.   :  0.00   Min.   :    0    
##  1st Qu.:     0   1st Qu.:    0.0   1st Qu.:  0.00   1st Qu.:    0    
##  Median :     0   Median :    0.0   Median :  0.00   Median :    0    
##  Mean   :  9969   Mean   :  930.8   Mean   : 43.33   Mean   : 1474    
##  3rd Qu.:     0   3rd Qu.:  900.0   3rd Qu.:  1.00   3rd Qu.:    0    
##  Max.   :218199   Max.   :28164.0   Max.   :906.00   Max.   :26665    
##                                                                       
##  bedrock_shore         docks         loose_gravel        natural      
##  Min.   :    0.0   Min.   : 0.000   Min.   :   0.00   Min.   :  3020  
##  1st Qu.:    0.0   1st Qu.: 0.000   1st Qu.:   0.00   1st Qu.: 13826  
##  Median :    0.0   Median : 0.000   Median :   0.00   Median : 63030  
##  Mean   :  297.6   Mean   : 2.113   Mean   :  16.48   Mean   :216903  
##  3rd Qu.:    0.0   3rd Qu.: 0.000   3rd Qu.:   0.00   3rd Qu.:246791  
##  Max.   :15989.0   Max.   :85.000   Max.   :1012.00   Max.   :777399  
##                                                                       
##  anthropogenic   
##  Min.   :     0  
##  1st Qu.:   946  
##  Median :  7708  
##  Mean   : 38186  
##  3rd Qu.: 33490  
##  Max.   :383525  
## 

Export Final Product as .csv File

write.csv(insect_counts2,"~/Downloads/2022.12.07.MetchBiomass_InsectCounts_Climate_Location_Landcover.csv", row.names = FALSE, fileEncoding = "UTF-8")

References

Blüthgen, N., Staab, M., Achury, R., & Weisser, W. W. (2022). Unravelling insect declines: Can space replace time? Biology Letters, 18(4), 20210666. https://doi.org/10.1098/rsbl.2021.0666

Gebert, F., Bollmann, K., Schuwirth, N., Duelli, P., Weber, D., & Obrist, M. K. (2024). Similar temporal patterns in insect richness, abundance and biomass across major habitat types. Insect Conservation and Diversity, 17(1), 139–154. https://doi.org/10.1111/icad.12700

Innes, F. (2024). Conservation of biodiversity: Insect species abundance and biomass trends in a rural municipality. Unpublished report. Directed Studies/BIOL490B, University of Victoria.

Hallmann, C. A., Sorg, M., Jongejans, E., Siepel, H., Hofland, N., Schwan, H., Stenmans, W., Müller, A., Sumser, H., Hörren, T., Goulson, D., & Kroon, H. de. (2017). More than 75 percent decline over 27 years in total flying insect biomass in protected areas. PLOS ONE, 12(10), e0185809. https://doi.org/10.1371/journal.pone.0185809

Hallmann, C. A., Zeegers, T., van Klink, R., Vermeulen, R., van Wielink, P., Spijkers, H., van Deijk, J., van Steenis, W., & Jongejans, E. (2020). Declining abundance of beetles, moths and caddisflies in the Netherlands. Insect Conservation and Diversity, 13(2), 127–139. https://doi.org/10.1111/icad.12377

Li, H., & Wilkins, K. T. (2022). Predator-prey relationship between urban bats and insects Impacted by both artificial light at night and spatial clutter. Biology, 11(6), Article 6. https://doi.org/10.3390/biology11060829

Müller, J., Hothorn, T., Yuan, Y., Seibold, S., Mitesser, O., Rothacher, J., Freund, J., Wild, C., Wolz, M., & Menzel, A. (2024). Weather explains the decline and rise of insect biomass over 34 years. Nature, 628(8007), 349–354. https://doi.org/10.1038/s41586-023-06402-z

Nikel, K. (2019). Estimating flying insect biomass and its spatiotemporal distribution in Metchosin: Biodiversity on southern Vancouver Island. Unpublished report. Honours Thesis, University of Victoria.

Seibold, S., Gossner, M. M., Simons, N. K., Blüthgen, N., Müller, J., Ambarlı, D., Ammer, C., Bauhus, J., Fischer, M., Habel, J. C., Linsenmair, K. E., Nauss, T., Penone, C., Prati, D., Schall, P., Schulze, E.-D., Vogt, J., Wöllauer, S., & Weisser, W. W. (2019). Arthropod decline in grasslands and forests is associated with landscape-level drivers. Nature, 574(7780), 671–674. https://doi.org/10.1038/s41586-019-1684-3

Svenningsen, C. S., Peters, B., Bowler, D. E., Dunn, R. R., Bonn, A., & Tøttrup, A. P. (2024). Insect biomass shows a stronger decrease than species richness along urban gradients. Insect Conservation and Diversity, 17(2), 182–188. https://doi.org/10.1111/icad.12694

Uhler, J., Redlich, S., Zhang, J., Hothorn, T., Tobisch, C., Ewald, J., Thorn, S., Seibold, S., Mitesser, O., Morinière, J., Bozicevic, V., Benjamin, C. S., Englmeier, J., Fricke, U., Ganuza, C., Haensel, M., Riebl, R., Rojas-Botero, S., Rummler, T., … Müller, J. (2021). Relationship of insect biomass and richness with land use along a climate gradient. Nature Communications, 12(1), 5946. https://doi.org/10.1038/s41467-021-26181-3

Uphus, L., Uhler, J., Tobisch, C., Rojas-Botero, S., Lüpke, M., Benjamin, C., Englmeier, J., Fricke, U., Ganuza, C., Haensel, M., Redlich, S., Zhang, J., Müller, J., & Menzel, A. (2023). Earlier and more uniform spring green-up linked to lower insect richness and biomass in temperate forests. Communications Biology, 6(1), 1–11. https://doi.org/10.1038/s42003-023-05422-9

Welti, E. A. R., Zajicek, P., Frenzel, M., Ayasse, M., Bornholdt, T., Buse, J., Classen, A., Dziock, F., Engelmann, R. A., Englmeier, J., Fellendorf, M., Förschler, M. I., Fricke, U., Ganuza, C., Hippke, M., Hoenselaar, G., Kaus-Thiel, A., Kerner, J., Kilian, D., … Haase, P. (2022). Temperature drives variation in flying insect biomass across a German malaise trap network. Insect Conservation and Diversity, 15(2), 168–180. https://doi.org/10.1111/icad.12555